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[57] ABSTRACT 

An encoding apparatus includes an encoder for encoding an 
alpha-map signal for discriminating a background from an 
object of an input picture in motion compensation prediction 
(MV)+transform encoding which uses MV in a domain of 
each of NxN transform coefficients (n), a transform circuit 
for transforming Pf into n in accordance with the alpha-map 
signal, an inverse transform circuit for reconstructing Pf by 
inversely transforming n in accordance with the alpha-map 
signal, a selector for obtaining a motion compensation 
prediction value (p) in the mth layer (m=2 to M) by 
switching p in the mth layer and p in the (m-l)th layer for 
each n, the selector selecting p in the mth layer for n by 
which a quantized output (Q) in the (m-l)th layer is 0 and 
selecting p in the (m-l)th layer for n by which Q=l or more, 
an adder for calculating a difference df between a prediction 
error signal in the mth layer and a dequantized output in the 
(m-l)th layer, and an encoder for encoding and outputting 
the quantized signal of df. This encoding apparatus realizes 
SNR scalability in M layers. 
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VIDEO ENCODING AND DECODING 
APPARATUS 

BACKGROUND OF THE INVENTION 

1. Field of the Invention s 
The present invention relates to video encoding and 

decoding apparatuses for encoding a picture signal at a high 
efficiency and transmitting or storing the encoded signal and, 
more particularly, to video encoding and decoding appara- 
tuses with a scalable function capable of scalable coding by 
which the resolution and the image quality can be changed 
into multiple layers. 

2, Description of the Related Art 

Generally, a picture signal is compression-encoded before is 
being transmitted or stored because the signal has an enor- 
mous amount of information. To encode a picture signal at 
a high efficiency, pictures whose unit is a frame are divided 
into a plurality of blocks in units of a predetermined number 
of pixels. Orthogonal transform is performed for each block 20 
to separate the spacial frequency of a picture into frequency 
components. Each frequency component is obtained as a 
transform coefficient and encoded. 

As one function of video encoding, a scalability function 
is demanded by which the image quality (SNR: Signal to 25 
Noise Ratio), the spacial resolution, and the time resolution 
can be changed step by step by partially decoding a bit 
stream. 

The scalability function is incorporated into Video Part 
(IS13818-2) of MPEG2 which is standardized in ISO/IEC. 30 

This scalability is realized by hierarchical encoding meth- 
ods. The scalability includes an encoder and a decoder of 
SNR scalability and also includes an encoder and a decoder 
of spacial scalability. 

In the encoder, layers are divided into a base layer (lower 
layer) whose image quality is low and an enhancement layer 
(upper layer) whose image quality is high. 

In the base layer, data is encoded by MPEG1 or MPEG2. 
In the enhancement layer, the data encoded by the base layer 40 
is reconstructed and the reconstructed base layer data is 
subtracted from the enhancement layer data. Only the result- 
ing error is quantized by a quantization step size smaller than 
the quantization step size in the base layer and encoded. That 
is, the data is more finely quantized and encoded. The 45 
resolution can be increased by adding the enhancement layer 
information to the base layer information, and this makes the 
transmission and storage of high-quality pictures feasible. 

As described above, pictures are divided into the base 
layer and the enhancement layer, data encoded by the base 50 
layer is reconstructed, the reconstructed data is subtracted 
from the original data, and only the resulting error is 
quantized by a quantization step size smaller than the 
quantization step size in the base layer and encoded. 
Consequently, pictures can be encoded and decoded at a 55 
high resolution. This technique is called SNR scalability. 

In the encoder, an input picture is supplied to the base 
layer and the enhancement layer. In the base layer, the input 
picture is so processed as to obtain an error from a motion 
compensation prediction value obtained from a picture of 00 
the previous frame, and the error is subjected to orthogonal 
transform (DCT). The transform coefficient is quantized and 
variable -length-decoded to obtain a base layer output. The 
quantized output is dequantized, subjected to inverse DCr, 
and added with the motion compensation prediction value of 65 
the previous frame, thereby obtaining a frame picture. 
Motion compensation prediction is performed on the basis 
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2 

of this frame picture to obtain the motion compensation 
prediction value of the previous frame. 

In the enhancement layer, on the other hand, the input 
picture is delayed until the prediction value is obtained from 
the base layer, and processing is performed to obtain an error 
from a motion compensation prediction value in the 
enhancement layer obtained from the picture of the previous 
frame. The error is then subjected to orthogonal transform 
(DCT), and the transform coefficient is corrected by using 
the dequantized output from the base layer, quantized, and 
variable-length-decoded, thereby obtaining an enhancement 
layer output. The quantized output is dequantized, added 
with the motion compensation prediction value of the pre- 
vious frame obtained in the base layer, and subjected to 
inverse DCT. A frame picture is obtained by adding to the 
result of the inverse DCT the motion compensation predic- 
tion value of the previous frame obtained in the enhance- 
ment layer. Motion compensation prediction is performed on 
the basis of this frame picture to obtain a motion compen- 
sation prediction value of the previous frame in the enhance- 
ment layer. 

In this way, video pictures can be encoded by using the 
SNR scalability. Note that although this SNR scalability is 
expressed by two layers, various SNR reconstructed pictures 
can be obtained by increasing the number of layers. 

In the decoder, the variable-length decoded data of the 
enhancement layer and the variable-length encoded data of 
the base layer which are separately supplied are separately 
variable-length-decoded and dequantized. The two dequan- 
tized data are added, and the result is subjected to inverse 
DCT. The picture signal is restored by adding the motion 
compensation prediction value of the previous frame to the 
result of the inverse DCT. Also, motion compensation 
prediction is performed on the basis of a picture in an 
immediately previous frame obtained from the restored 
picture signal, thereby obtaining a motion compensation 
prediction value of the previous frame. 

The foregoing are examples of encoding and decoding 
using the SNR scalability. 

On the other hand, the spacial scalability is done on the 
basis of the spacial resolution, and encoding is separately 
performed in a base layer whose spacial resolution is low 
and an enhancement layer whose spacial resolution is high. 
In the base layer, encoding is performed by using a normal 
MPEG2 encoding method. In the enhancement layer, 
up-sampling (in which a high-resolution picture is formed 
by adding pixels such as average values between pixels of a 
low-resolution picture) is performed for the picture from the 
base layer to thereby form a picture having the same size as 
the enhancement layer. Prediction is adaptively performed 
on the basis of motion compensation prediction using the 
picture of the enhancement layer and motion compensation 
prediction using the up-sampled picture. Consequently, 
encoding can be performed at a high efficiency. 

The spacial scalability exists in order to achieve backward 
compatibility by which, for example, a portion of a bit 
stream of MPEG2 can be extracted and decoded by MPEG1. 
That is, the spacial scalability is not a function capable of 
reconstructing pictures with various resolutions (reference: 
"Special Edition MPEG", Television Magazine, Vol. 49, No. 
4, pp. 458-463, 1995). 

More specifically, the video encoding technology of 
MPEG2 aims to accomplish high-efficiency encoding of 
high-quality pictures and high-quality reconstruction of the 
encoded pictures. In this technology, pictures faithful to 
encoded pictures can be reconstructed. 
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Unfortunately, with the spread of multimedia, there is a In an encoding system like this, a picture with an arbitrary 

demand for a reconstructing apparatus capable of fully shape must be encoded. As a method of encoding an 

decoding data of high-quality pictures encoded at a high arbitrary -shape picture, there is an arbitrary -shape picture 

efficiency, as a system on the reconstruction side. In signal orthogonal transform method described in previously 

addition, there are demands for a system such as a portable 5 filed Japanese Patent Application No. 7-97073. In this 

system which is only required to reconstruct pictures regard- orthogonal transform method, the values of pixels contained 

less of whether the image quality is high, and for a simplified in a specific domain are separated from an input edge block 

system by which the system price is decreased. signal by a separation circuit (SEP), and an average value 

To meet these demands, a picture is divided into, e.g., 8x8 calculation circuit (AVE) calculates an average value a of the 

pixel matrix blocks and DCT is performed in units of blocks. 30 separated pixel values. 

In this case, 8x8 transform coefficients are obtained. If an alpha-map indicates a pixel in the specific domain, 

Although it is originally necessary to decode the data from a selector (SEL) outputs the pixel value in the specific 

the first low frequency component to the eighth low fre- domain stored in a block memory (MEM). If the alpha-map . 

quency component, the data is decoded from the first low indicates another pixel, the selector outputs the average 

frequency component to the fourth low frequency compo- 15 value a. The block signal thus processed is subjected to 

nent or from the first low frequency component to the sixth two-dimensional DCT to obtain transform coefficients for 

low frequency component. In this manner decoding is sim- pixels in the specific domain. 

plified by restoring the picture by reconstructing the signal 0 n the other hand, inverse transform is accomplished by 

of 4x4 resolution or the signal of 6x6 resolution, rather than separating the pixel values in the specific domain from pixel 

the signal of 8x8 resolution. 20 vahies {n lhe block o5tained 5y performing inverse DCT for 

Unfortunately, when a picture which originally has 8x8 the transform coefficient, 

information is restored by using 4x4 or 6x6 information, a M described above , in the scalable encoding method 

mismatch occurs between the restored value and the motion capable of div iding pictures into multiple layers, the coding 

compensation prediction value, and errors are accumulated. efficiency is sometimes greatly decreased when video pic- 

ThLS significantly degrades the picture. Therefore, it is an 25 tures are encodedt Irj add ition, scalable encoding by which 

important subject to overcome this mismatch between the lhe resolution and the image quality can be made variable is 

encoding side and the decoding side. also required in ^ arbitrary-shape picture encoding appa- 

Note that as a method of converting the spacial resolution ratus which separately encodes the background and the 

in order to control the difference between the spacial reso- ^ object. It is also necessary to improve the efficiency of 

lutions on the encoding side and the decoding side, there is motion compensation prediction encoding for an arbitrary - 

another method, although the method is not standardized, by shape picture. 

which the spacial resolution is made variable by inversely 0n the Qther hand> the mid . level 6ncoding systcm has the 

converUng some coefficients of orthogonal transform (e.g., advantage mat a method of evenly ranging the internal 

DCT (Discrete Cosine Transform)) by an order smaller than 3j av6rag6 valu6 of the object in ^ background can be rcalizcd 

tne original order. w - t k a £ ew calculations. However, a step of pixel values is 

Unfortunately, when motion compensation prediction is sometimes formed in the boundary between the object and 

performed by using the resolution-converted picture, image the background. If DCT is performed in a case like this, a 

quality degradation called a drift resulting from the motion i arge quantity of high-frequency components are generated 

compensation prediction occurs in the reconstructed picture 4Q m & so the amount of codes is not decreased, 
(reference: Iwahashi et al., "Motion Compensation for 

Reducing Drift in Scalable Decoder", Shingaku Giho IE94- SUMMARY OF THE INVENTION 
97, 1994). 

Accordingly, the method has a problem as a technique to lt * an ob J ect of * e present invention to provide an 

overcome the mismatch between the encoding side and the 45 encodm g apparatus and a decoding apparatus capable of 

decoding side improving the coding efficiency when video pictures are 

a ,i ,i * i » ,i . . ... . . encoded by a scalable encoding method by which pictures 

On the other hand, the spacial scalability exists in order to . j- -j i • . w t i 

, , , f-.-iv . i_- l r » can be divided into multiple layers, 

achieve backward compatibility by which, tor example, a r J 

portion of a bit stream of MPEG2 can be extracted and 11 * another object of the present invention to provide a 
decoded by MPEG1. That is, the spacial scalability is not a 50 Scalable encodin g apparatus and a scalable decoding appa- 
function of capable of reconstructing pictures with various ratus capable of making the resolution and the image quality 
resolutions (reference: "Special Edition MPEG", Television variable and improving the coding efficiency in an arbitrary- 
Magazine, Vol. 49, No. 4, pp. 458-463, 1995). Since hier- sha P e P icturc encoding apparatus which separately encodes 
archical encoding is performed to realize the scalability a background and an object. 

function as described above, information is divisionally 5S It is still another object of the present invention to 

encoded and this decreases the coding efficiency. improve the efficiency of motion compensation prediction 

A video encoding system belonging to a category called encoding for arbitrary-shape pictures, 

mid-level encoding is proposed in "J. Y. A. Wang et. al., It is still another object of the present invention to 

"Applying Mid-level Vision Techniques for Video Data alleviate the drawback that the code amount is not decreased 

Compression and Manipulation", M.I.T. Media Lab. Tech. 60 due to the generation of a large quantity of high-frequency 

Report No. 263, February 1994". components when DCT is performed, even if a step of pixel 

In this system, a background and an object are separately values is formed in the boundary between an object and a 

encoded. To separately encode the background and the background when a method of evenly arranging an internal 

object, an alpha-map signal which represents the shape of average value of the object in the background is used, 

the object and the position of the object in a frame is 65 According to the present invention, there is provided a 

necessary. An alpha-map signal of the background can be video encoding apparatus comprising: an orthogonal trans- 

uniquely obtained from the alpha-map signal of the object. form circuit for orthogonally transforming an input picture 
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signal to obtain a plurality of transform coefficients; a first mth-layer (m=2 to M) motion compensation prediction 
local decoder for outputting first transform coefficients for a signal; a second motion compensation prediction processing 
fine motion compensation prediction picture on the basis of section for performing motion compensation prediction pro- 
a previous picture; a second local decoder for outputting cessing for the plurality of transform coefficients in order to 
second transform coefficients for a coarse motion compen- 5 obtain an (m-l)th-layer motion compensation prediction 
sation prediction picture on the basis of a current picture signal; switching means for selecting the mth-layer motion 
corresponding to the input picture signal; means for detect- compensation prediction signal of the first motion compen- 
ing a degree of motion compensation prediction in the sation prediction processing section in order to obtain an 
second local decoder; a selector for selectively outputting mth-layer prediction value when a quantized output from the 
the first and second transform coefficients in accordance 10 second motion compensation prediction processing section 
with the degree of motion compensation prediction; a first is 0, and switching between the mth-layer motion compen- 
calculator for calculating a difference between the transform sation prediction signal and the (m-l)th-layer motion corn- 
coefficients of the orthogonal transform circuit and ones of pensation prediction signal in units of transform coefficients 
the first and second transform coefficients which are selected in order to select the (m-l)tb-layer motion compensation 
by the selector, and outputting a motion compensation a5 prediction signal when the quantized output is not less than 
prediction error signal; a first quantizer for quantizing the 1; means for calculating a difference signal between an 
motion compensation prediction error signal from the first (m-l)th-layer dequantized output from the second motion 
adder and outputting a first quantized motion compensation compensation prediction processing section and an mth- 
prediction error signal; a second calculator for calculating a layer motion compensation prediction error signal obtained 
difference between the second transform coefficients from 2 rj by a difference between the mth-layer motion compensation 
the second local decoder and the transform coefficients from prediction signal and the transform coefficient from the 
the orthogonal transform circuit, and outputting a second orthogonal transform circuit; and encoding means for quan- 
motion compensation prediction error signal; a second quan- tizing and encoding the difference signal to output an 
tizer for quantizing the motion compensation prediction encoded bit stream. 

error signal from the second calculator, and outputting a 25 According to the present invention, there is provided a 

second quantized motion compensation prediction error video encoding/decoding system comprising: a video encod- 

signal; and an encoder for encoding the first and second mg apparatus for realizing SNR (Signal to Noise Ratio) 

quantized motion compensation prediction error signals and scalability in M layers, which includes an orthogonal trans- 

outputting encoded signals. f orm circuit for dividing an input video signal into a plurality 

According to the present invention, there is provided a 30 of blocks each containing NxN pixels and orthogonally 
video encoding apparatus comprising: an orthogonal trans- transforming the input video signal in units of blocks to 
form circuit for dividing an input video signal into a plurality obtain a plurality of transform coefficients divided in spacial 
of blocks each containing NxN pixels and orthogonally frequency bands, a first motion compensation prediction 
transforming the input video signal in units of blocks to processing section for performing motion compensation 
obtain a plurality of transform coefficients divided in spacial 35 prediction processing for the plurality of transform coeffi- 
frequency bands; a first motion prediction processing section cients in order to obtain an mth-layer (m=2 to M) motion 
for performing motion compensation prediction processing compensation prediction signal, a second motion compen- 
for the plurality of transform coefficients in order to obtain sation prediction processing section for performing motion 
an upper-layer motion compensation prediction signal hav- compensation prediction processing for the plurality of 
ing the number of data enough to obtain a high image 40 transform coefficients in order to obtain an (m-l)th-layer 
quality; a second motion prediction processing section for motion compensation prediction signal, switching means for 
performing motion compensation prediction processing for selecting the mth-layer motion compensation prediction 
the plurality of transform coefficients in order to obtain a signal of the first motion compensation prediction process- 
lower-layer motion compensation prediction signal upon ing section in order to obtain an mth-layer prediction value 
reducing the number of data; a decision section for deciding 45 when a quantized output from the second motion compen- 
in motion compensation on the basis of the lower-layer sation prediction processing section is 0, and switching 
motion compensation prediction signal whether motion between the mth-layer motion compensation prediction sig- 
compensation prediction is correct; a selector for selecting nal and the (m-l)th-layer motion compensation prediction 
the upper- layer motion compensation prediction signal in signal in units of transform coefficients in order to select the 
response to a decision representing a correct motion com- so (m-l)th-layer motion compensation prediction signal when 
pensation prediction from the decision section, and the the quantized output is not less than 1, means for calculating 
lower-layer motion compensation prediction signal in a difference signal between an (m-l)th-layer dequantized 
response to a decision representing an incorrect motion output from the second motion compensation prediction 
compensation prediction; and an encoder for encoding one processing section and an mth-layer motion compensation 
of the upper-layer motion compensation prediction signal 55 prediction error signal obtained by a difference between the 
and the lower-layer motion compensation prediction signal mth-layer motion compensation prediction signal and the 
which is selected by the selector. transform coefficient from the orthogonal transform circuit, 

According to the present invention, there is provided a and encoding means for quantizing and encoding the dif- 

video encoding apparatus for realizing SNR scalability in M ference signal to output an encoded bit stream; and a video 

layers, comprising: an orthogonal transform circuit for 60 decoding apparatus which includes means for extracting 

dividing an input video signal into a plurality of blocks each codes up to a code in the mth (m«2 to M) layer from the 

containing NxN pixels and orthogonally transforming the encoded bit stream from the video encoding apparatus, 

input video signal in units of blocks to obtain a plurality of decoding means for decoding the codes of respective layers 

transform coefficients divided in spacial frequency bands; a up to the mth layer, dequantization means for dequantizing, 

first motion compensation prediction processing section for 65 in the respective layers, the quantized values decoded by the 

performing motion compensation prediction processing for decoding means, switching means for switching the mth- 

the plurality of transform coefficients in order to obtain an layer (m-2 to M) motion compensation prediction value and 
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the (m-l)th-Layer motion compensation prediction value in FIG. 7 is a block diagram for explaining the present 

units of transform coefficients, and outputting the mth-layer invention, showing the configuration of a decoding appara- 

motion compensation prediction value for the quantized tus according to the second embodiment of the present 

output of 0 in the (m-l)th layer and the (m-l)th -layer invention; 

motion compensation prediction value for the quantized 5 figs. 8A and 8B are block diagrams for explaining the 
output of not less than 1 in the (m-l)th layer in units of present invention, showing the configurations of motion 
transform coefficients in order to obtain the mth-layer pre- compensation prediction sections according to the third 
diction value, and means for adding the mth-layer motion embodiment of the present invention; 
compensation prediction value and the (m-l)th-layer mQ 9 fe a yiew for laiai the nt iDventiori) 
motion compensation prediction value to reconstruct the 10 which iUus tr a tes an example of a quantization matrix used in 
mth-layer motion compensation prediction error signal. (he present invention- 
According to the present invention, there is provided a piG 1Q ^ & yifiw ' for e ^ ( iaventi 
video encoding apparatus comprising: an rthogond trans- which aQ }& q{ a matrix ^ ^ 

form circuit for dividing an input video signal into a plurality . 

c t,i i u * • * vt xt ■ i j _*u 11 ie the present invention 

of blocks each containing NxN pixels and orthogonally « tt t , , . . 

transforming an arbitrary-shape picture in units of blocks to J lG }} sh l ° WS an f xample ° f i ^ uantlzaUon matrix 

obtain a plurality of transform coefficients; means for encod- obtained for the example shown in FIG. 2; 

ing and outputting an alpha-map signal for discriminating a FIG. 12 shows an example of a two-dimensional matrix 

background of a picture from an object thereof; means for which is divided into eight portions in each of a horizontal 

calculating an average value of pixel values of an object 20 direction (h) and a vertical direction (v); 

portion using the alpha- map signal in units of blocks; means FIG. 13 shows a scan order for the example shown in FIG. 

for assigning the average value to a background portion of 2; 

the block; means for deciding using the alpha-map signal pjQ 14 ^ a view for explaining the present invention, 

whether a pixel in the object is close to the background; wnicn explains the fourth embodiment of the present inven- 

means for compressing, about the average value, the pixel in 25 t - on . 

the object decided to be close to the background; and means ^ Qs u t and 15C m Wews for lainin m 

tor orthogonally transforming each block to output an i c * • • . , m 

, ° J , _ . ° Y example of a video transmission system to which the video 

orthogonal transform coefficient. encoding apparatus and the video decoding apparatus 

Additional objects and advantages of the invention will be acc0 rding to the present invention are applied; 

set forth in the description which follows, and in part will be 30 rTO . . c , . . 4 . c 

c iLJ i . v5 FIG. 16 is a view for explaining a modification of the 

ocyious trom ue description, or may De teamed Dy practice secon(J embodiment of ^ -J" 

invention, which is a 

of the invention. The objects and advantages of the invention , , , T , ' . 

, j i, ■ j u t.u*. graph showing an example m which an average value is 

may be realized and obtained by means of the instrumen- a • k k d • 

talities and combinations particularly pointed out in the * ™ ' 

appended claims 35 1^ is a view for explaining another modification of 

the second embodiment of the present invention, which is a 

BRIEF DESCRIPTION OF THE DRAWINGS gra ph for explaining an example in which a step is 

The accompanying drawings, which are incorporated in decreased; 

and constitute a part of the specification, illustrate presently FIG, 18 is a view for explaining still another modification 

preferred embodiments of the invention and, together with 40 of the second embodiment of the present invention, which 

the general description given above and the detailed descrip- illustrates examples of block pixel values; 

tion of the preferred embodiments given below, serve to FIG. 19 is a view for explaining still another modification 

explain the principles of the invention. of the second embodiment of the present invention, which is 

FIG. 1 is a block diagram for explaining the present a graph for explaining another example in which a step is 

invention, showing the configuration of an encoding appa- 45 decreased; 

rams according to the first embodiment of the present FIG. 20 is a view for explaining still another modification 

invention; of the second embodiment of the present invention, which 

FIG. 2 is a view for explaining the present invention, illustrates examples of block pixel values; 

which explains a prediction value switching method to be $Q FIG. 21 is a block diagram showing an example of an 

applied to the present invention; encoding apparatus as still another modification of the 

FIGS. 3 A and 3B are block diagrams for explaining the second embodiment of the present invention; and 

present invention, showing the configurations of motion FIG. 22 is a block diagram showing an example of a 

compensation prediction sections according to the first decoding apparatus as still another modification of the 

embodiment of the present invention; 55 second embodiment of the present invention. 

FIG. 4 is a block diagram for explaining the present ncTAUCn nccr^urrrnM ncTun 

invention, showing the configuration of a decoding appara- D ™^ B ™ pw^himpi^ 

tus according to the first embodiment of the present inven- FKhrliKKfcD bMBODlMfcN lb 

tion; In the present invention, when motion compensation is to 

FIG. 5 is a block diagram for explaining the present eo be performed in a transform coefficient domain in units of 

invention, showing the configuration of an encoding appa- NxN transform coefficients, encoding in an upper layer 

ratus according to the second embodiment of the present (enhancement layer) is performed on the basis of an already 

invention; decoded and quantized value of a lower layer (base layer). 

FIGS. 6 A and 6B are block diagrams for explaining the This realizes an encoding system which can perform encod- 

present invention, showing the configurations of motion 65 ing with little decrease in the encoding coefficient, 

compensation prediction sections according to the second Also, in the above encoding apparatus of the present 

embodiment of the present invention; invention, orthogonal transform can be performed for a 
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picture domain with an arbitrary shape in accordance with 
an alpha-rnap signal indicating the arbitrary-shape picture 
domain. Consequently, a reconstructed picture with a vari- 
able image quality can be obtained for an arbitrary -shape 
picture. 

In the present invention, a frame memory is prepared for 
each of a background and one or more objects, and motion 
compensation prediction is performed for each of the back- 
ground and the objects. This improves the efficiency of 
prediction for a portion hidden by overlapping of the objects. 

Furthermore, the efficiency of motion compensation pre- 
dictive encoding is improved by decreasing the range of 
motion vector detection in the boundary of an object. 

Embodiments of the present invention will be described 
below with reference to the accompanying drawings. 

The first embodiment of the present invention will be 
described with reference to FIGS. 1, 2, 3 A, 3B, and 4. This 
embodiment is related to an encoding apparatus and a 
decoding apparatus which realize SNR scalability of M 
layers as a whole. The coding efficiency in the mth layer is 
improved by adap lively switching a motion compensation 
prediction signal in the mth layer and a motion compensa- 
tion prediction signal in the (m-l)th layer. In the accompa- 
nying drawings, a base layer corresponds to the (m-l)th 
layer and an enhancement layer corresponds to the mth 
layer. 

In the encoding apparatus shown in FIG. 1, an input signal 
is input to an orthogonal transform circuit, e.g., DCT circuit 
100. The output terminal of the DCT circuit is connected to 
the input terminals of adders 110 and 111. The other input 
terminal of the adder 110 is connected to a selector 300. The 
output terminal of the adder 110 is connected to a quantizer 

130 via an adder 120. The output terminal of the quantizer 
140 is connected to an output buffer 160 via a variable- 
length encoder 140 and a multiplexer 150. 

The output terminal of the quantizer 130 is connected to 
a motion compensation prediction section (MCP) 200 via a 
dequantizer 170 and adders 180 and 190. The output of the 
motion compensation prediction section 200 is connected 
selectively to the adders 110 and 120 by the selector 300. 
The encoding controller 400 controls the quantizer 130 and 
the variable-length encoder 140 in accordance with the 
output signal from the output buffer 160. 

The output terminal of the adder 111 is connected to the 
input terminal of the quantizer 131 the output terminal of 
which is connected to is connected to the output buffer 161 
via a variable-length encoder 141 and a multiplexer 151. 

The output terminal of the quantizer 131 is connected to 
a motion compensation prediction section 201 of the 
enhancement layer via a dequantizer 171 and an adder 191. 
The output terminal of the motion compensation prediction 
section 201 is connected to the selector 300 and adders 111 
and 191. The encoding controller 410 controls the quantizer 

131 and the variable-length encoder 141 in accordance with 
the output signal from the output buffer 161. Amotion vector 
detector 500 receives the input video signal 10 and is 
connected to the motion compensation prediction section 
200, motion compensation prediction section 201 and 
variable-length encoder 141. 

The DCT circuit 100 performs orthogonal transform 
(DCT) for an input picture signal 10 to obtain transform 
coefficients of individual frequency components. The adder 
110 calculates the difference between the transform coeffi- 
cient from the DCT circuit 100 and one of an output (EMC) 
from the enhancement layer motion compensation predic- 
tion section 200 and an output (BMC) from the base layer 
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motion compensation prediction section 201 which are 
selectively supplied via the selector 300. The adder 120 
calculates the difference between an output from the adder 
110 and an output from the dequantizer 171. 

The quantizer 130 quantizes an output from the adder 120 
in accordance with a quantization scale supplied from the 
encoding controller 400. The variable -length encoder 140 
performs variable-length encoding for the quantized output 
from the quantizer 130 and side information such as the 
quantization scale supplied from the encoding controller 
400. 

The multiplexer 150 multiplexes the variable-length code 
of the quantized output and the variable-length code of the 
side information supplied from the variable-length encoder 
140. The output buffer 160 temporarily holds and outputs the 
data stream multiplexed by the multiplexer 150. 

The encoding controller 400 outputs information of an 

optimum quantization scale Q scale on the basis of buffer 

capacity information from the buffer 160. The encoding 
controller 400 also supplies this information of the quanti- 
zation scale Q_scale to the variable-length encoder 140 as 
the side information, thereby causing the quantizer 130 to 
perform quantization and the variable-length encoder 140 to 
perform variable-length encoding. 

The dequantizer 170 dequantizes the quantized output 
from the quantizer 130 and outputs the result. The adder 180 
adds the output from the dequantizer 170 and the output 
from the dequantizer 171. The adder 190 adds the output 
from the adder 180 and a compensation prediction value 
selectively output from the selector 300. 

The motion compensation prediction section 200 calcu- 
lates a motion compensation prediction value in the 
enhancement layer on the basis of the output from the adder 
180 and a motion vector detected by the motion vector 
detector 500. When receiving the motion compensation 
prediction value calculated by the motion compensation 
prediction section 200 and the motion compensation predic- 
tion value calculated by the motion compensation prediction 
section 201, the selector 300 selectively outputs one of these 
motion compensation prediction values in accordance with 
an output from a binarizing circuit 310. 

In the above configuration, the adder 110, the adder 120, 
the quantizer 130, the variable-length encoder 140, the 
multiplexer 150, the output buffer 160, the dequantizer 170, 
the adder 180, the adder 190, the motion compensation 
prediction section 200, the selector 300, and the encoding 
controller 400 constitute the enhancement layer. The quan- 
tizer 170, the adder 180, the adder 190 and the motion 
compensation prediction section 200 construct a local 
decoder of the enhancement layer. 

The motion vector detector 500 described above receives 
the same picture signal as the input picture signal to the DCT 
circuit 100 and detects a motion vector from this picture 
signal. On the basis of the motion vector supplied from the 
motion vector detector 500 and the sum output from the 
adder 191, the motion compensation prediction section 201 
performs motion compensation prediction and obtains a 
motion compensation prediction value (BMC) which is 
converted into a DCT coefficient. 

The adder 111 calculates the difference between the 
output transform coefficient from the DCT circuit 100 and 
the output motion compensation prediction value (BMC) 
from the motion compensation prediction section 201. The 
quantizer 131 quantizes the output from the adder 111 in 
accordance with the quantization, scale designated by the 
encoding controller 410. 
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The binarizing circuit 310 checks whether the quantized 
value output from the quantizer 131 is "0". If the value is 
"0", the binarizing circuit 310 outputs "0". If the value is not 
"0", the binarizing circuit 310 outputs "1". The dequantizer 
171 performs dequantization in accordance with the quan- 
tization scale designated by the encoding controller 410. The 
adder 191 adds the output from the dequantizer 171 and the 
output from the motion compensation prediction section 201 
and supplies the sum to the motion compensation prediction 
section 201. 

The variable-length encoder 141 performs variable-length 
encoding for the quantized output from the quantizer 131 
and the side information such as the quantization scale 
supplied from the encoding controller 410. The multiplexer 
151 multiplexes the variable-length code of the quantized 
output and the variable-length code of the side information 
supplied from the variable-length encoder 141. The output 
buffer 161 temporarily holds and outputs the data stream 
multiplexed by the multiplexer 151. 

The encoding controller 410 outputs the information of 
the optimum quantization scale Q_scale on the basis of 
buffer capacity information from the buffer 161. The encod- 
ing controller 410 also supplies this information of the 
quantization scale Q_scale to the variable-length encoder 
141 as the side information, thereby causing the quantizer 
131 to perform quantization and the variable -length encoder 
141 to perform variable-length encoding. 

The adder 111, the quantizer 131, the variable-length 
encoder 141, the multiplexer 151, the output buffer 161, the 
dequantizer 171, the adder 191, the motion compensation 
prediction section 201, the binarizing circuit 310, the encod- 
ing controller 410, and the motion vector detector 500 
constitute the base layer. The dequantizer 171, the adder 191 
and the motion compensation prediction section 201 con- 
stitute a local decoder. 

This apparatus with the above configuration operates as 
follows. 

Hie input picture signal 10 is supplied to the DCT circuit 
100 and the motion vector detector 500. The motion vector 
detector 500 detects a motion vector from the picture signal 
10 and supplies the detected vector to the motion compen- 
sation prediction sections 200 and 201 and the variable- 
length encoder 141. 

The picture signal 10 input to the DCT circuit 100 is 
divided into blocks each having a size of NxN pixels and 
orthogonally transformed in units of NxN pixels by this 
DCT circuit 100. Consequently, NxN transform coefficients 
are obtained for each block. These transform coefficients are 
NxN transform coefficients obtained by separating the spa- 
cial frequency components of the picture into components 
ranging from a DC component to individual AC compo- 
nents. 

These NxN transform coefficients obtained by the DCT 
circuit 100 are supplied to the adder 110 in the enhancement 
layer and the adder 111 in the base layer. 

In the base layer, the adder 111 calculates the difference 
between the transform coefficient and the motion compen- 
sation prediction value (BMC) which is converted into a 
DCT coefficient and supplied from the motion compensation 
prediction section 201, thereby obtaining a prediction error 
signal. This prediction error signal is supplied to the quan- 
tizer 131 to be quantized in accordance with the quantization 
scale Q_scale input by the encoding controller 410. The 
quantized prediction error signal is supplied to the variable- 
length encoder 141 and dequantizer 171. 

The variable-length encoder 141 performs variable-length 
encoding for the quantized prediction error signal, the side 
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information such as the quantization size supplied from the 
encoding controller 410, and the motion vector information 
supplied from the motion vector detector 500. This variable- 
length-encoded output is supplied to the multiplexer 151 to 
be multiplexed thereby, and supplied to the output buffer 
161. The output buffer 161 outputs the multiplexed signal, as 
an encoded bit stream 21, to a transmission line or a storage 
medium. Also, the output buffer 161 feeds the capacity of the 
buffer back to the encoding controller 410. 

In accordance with the capacity information from the 
buffer, the encoding controller 410 controls the output from 
the quantizer 131 and outputs the quantization scale 
Q_scale to the quantizer 131. This information of the 
quantization scale Q_scale is also supplied to the variable - 
length encoder 141 as the side information. 

Since the encoding controller 410 controls the output 
from the quantizer 131 in accordance with the capacity 
information from the buffer, the encoding controller 410 can 
advance the quantization while controlling the quantization 
scale so that the output buffer 161 does not overflow. 

The information of the quantization scale Q__scale is 
variable-length-encoded as the side information by the 
variable-length encoder 141 and multiplexed by the multi- 
plexer 151. The multiplexed signal is used as the output from 
the video encoding apparatus. Consequently, the quantiza- 
tion scale used in dequantization when the video decoding 
apparatus performs decoding can be obtained. 

Meanwhile, the quantized value of the prediction error 
signal supplied to the dequantizer 171 is dequantized and 
supplied to the adder 191. The adder 191 adds the dequan- 
tized value to the motion compensation prediction value 
BMC and thereby calculates a reconstructed value in the 
transform coefficient domain. This value is supplied to the 
motion compensation prediction section 201. 

In the enhancement layer, the output EMC from the 
motion compensation prediction section 200 of the enhance- 
ment layer and the output BMC from the motion compen- 
sation prediction section 201 of the base layer are adaptively 
and selectively output for each transform coefficient. That is, 
on the basis of an output BQ from the quantizer 131 in the 
base layer, the selector 300 adaptively and selectively out- 
puts the output (EMC) from the motion compensation 
prediction section 200 of the enhancement layer and the 
output BMC from the motion compensation prediction sec- 
tion 201 of the base layer for each transform coefficient in 
accordance with a method to be described later. 

The adder 110 calculates a prediction error signal between 
the transform coefficient of the input picture supplied from 
the DCT circuit 100 and an output EP from the selector 300 
and supplies the signal to the adder 120. The adder 120 
calculates the difference between a signal 30 of the dequan- 
tized value BQ supplied from the dequantizer 171 and the 
output from the adder 110 and supplies the difference as a 
difference value output signal EC to the quantizer 130. This 
difference value output signal EC is the motion compensa- 
tion prediction error signal. 

The quantizer 130 quantizes the difference value output 
signal EC in accordance with the quantization scale Q_scale 
supplied from the encoding controller 400 and supplies the 
quantized signal to the variable -length encoder 140 and the 
dequantizer 170. 

The variable -length encoder 140 performs variable- length 
encoding for the quantized motion compensation prediction 
error signal together with the side information and supplies 
the encoded signals to the multiplexer 150. The multiplexer 
150 multiplexes these signals and supplies the multiplexed 
signal to the output buffer 160. 
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The output buffer 160 outputs the multiplexed signal to a 
transmission line or a storage medium as an encoded bit 
stream 20 for the enhancement layer. Also, the output buffer 
160 feeds the capacity of the buffer back to the encoding 
controller 400. 

The quantized value supplied to the dequantizer 170 is 
dequantized. The adder 180 adds the dequantized value to 
the output 30 supplied from the dequantizer 171 of the base 
layer, thereby reconstructing the prediction error signal. 

The adder 190 adds the prediction error signal recon- 
structed by the adder 180 to the motion compensation 
prediction value EMC and thereby calculates a reconstructed 
value in the transform coefficient domain. This reconstructed 
value is supplied to the motion compensation prediction 
section 200. 

FIG. 2 shows a switching unit described in reference (T. 
K. Tan et al., "A Frequency Scalable Coding Scheme 
Employing Pyramid and Subband Techniques", IEEE Trans. 
CAS for Video Technology, Vol. 4, No. 2, April 1994), 
which is an example of the switching unit optimally appli- 
cable to the selector 300. 

Referring to FIG. 2, the binarizing circuit 310 decides 
whether the value of the output BQ from the quantizer 131 
in the base layer is "0". This decision result is supplied to the 
selector 300. If the value of the output BQ from the quantizer 
131 is "0", the selector 300 selects the transform coefficient 
output EMC from the enhancement layer motion compen- 
sation prediction section 200. If the value is "1", the selector 
300 selects the transform coefficient output BMC from the 
base layer motion compensation prediction section 201. 

That is, the binarizing circuit 310 outputs "0" when the 
value of the output BQ from the quantizer 131 in the base 
layer is "O" and outputs "1" when the value is not "0". 
Therefore, the selector 300 is made to select EMC when the 
output from the binarizing circuit 310 is "0" and BMC when 
the output is "1". Consequently, the transform coefficient 
output EMC from the motion compensation prediction sec- 
tion 200 in the enhancement layer is applied to a transform 
coefficient in a position where the output BQ from the 
quantizer 131 is "0", and the transform coefficient output 
BMC from the motion compensation prediction section 201 
in the base layer is applied to a transform coefficient in a 
position where the output BQ from the quantizer 131 is not 

The quantizer 131 in the base layer receives the output 
from the adder 111 and quantizes this output from the adder 
111. The adder HI receives the output from the DCT circuit 
100 and the motion compensation prediction value obtained 
by the motion compensation prediction section 201 from a 
picture in an immediately previous frame, and calculates the 
difference between them. Therefore, if the calculated motion 
compensation prediction value is correct, the difference 
between the two values output from the adder 111 is "0". 

Accordingly, of the quantized values as the output BQ 
from the quantizer 131 in the base layer, coefficients (values 
in portions enclosed by the, circles in FIG. 2) having values 
other than "0" are coefficients whose motion compensation 
prediction is incorrect! 

If the motion compensation prediction section 200 per- 
forms motion compensation prediction by using the same 
motion vector as in the base layer supplied from the motion 
vector detector 500, it is estimated that motion compensa- 
tion prediction for coefficients (values in portions enclosed 
by the circles) in the enhancement layer in the same posi- 
tions as in the base layer is incorrect. 

Accordingly, the selector 300 selects BMC for these 
coefficients. 
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On the other hand, it is estimated that motion compensa- 
tion for other coefficients is correct. Therefore, the selector 
300 selects a prediction value in the enhancement layer with 
a smaller encoding deviation. Consequently, the signal EC 
encoded in the enhancement layer is used as the quantized 
error signal of the base layer when motion compensation 
prediction is incorrect, and as the motion compensation 
prediction error signal of the enhancement layer when 
motion compensation prediction is correct. This improves 
the coding efficiency of coefficients whose motion compen- 
sation prediction is incorrect. 

Note that the technique disclosed in the reference cited 
above is based on the assumption that pictures having low 
resolutions are reconstructed in the base layer, and so 
low-frequency coefficients which are the transform coef- 
ficients calculated by the DCT circuit 100 are separated and 
supplied to the base layer. As a consequence, the reliability 
of estimation for switching prediction for each transform 
coefficient is decreased due to an error produced by resolu- 
tion conversion. 

In this embodiment, on the other hand, the resolutions of 
the base layer and the enhancement layer are equal. 
Therefore, the embodiment is different from the technique 
disclosed in the reference cited above in that the accuracy of 
estimation is improved. A great advantage of the embodi- 
ment is a high image quality. 

The configuration of the motion compensation prediction 
sections 200 and 201 used in the apparatus of the present 
invention will be described below. 

FIG. 3Ais a block diagram showing the configuration of 
the motion compensation prediction sections 200 and 201. 
Each of the motion compensation prediction sections 200 
and 201 consists of an IDCT circuit 210, a frame memory 
220, a motion compensation circuit 230, and a DCT circuit 
240. 

The IDCT circuit 210 restores the reconstructed picture 
signal by performing inverse orthogonal transform (IDCT) 
for the output from the adder 190 or 191. The frame memory 
220 holds the reconstructed picture signal obtained by this 
inverse orthogonal transform, as a reference picture, in units 
of frames. The motion compensation circuit 230 extracts a 
picture in a position indicated by a motion vector in units of 
blocks from the picture signals (reference pictures) stored in 
the frame memory 220. The DCT circuit 240 performs 
orthogonal transform (DCT) for the extracted picture and 
outputs the result. Note that the motion vector is supplied 
from the motion vector detector 500. In this configuration, a 
reconstructed value in a transform coefficient domain is 
inversely transformed into the reconstructed picture signal 
by the IDCT circuit 210 and stored in the frame memory 
220. The motion compensation circuit 230 extracts a picture 
in a position indicated by the motion vector in units of 
blocks from the reference pictures stored in the frame 
memory 220, and supplies the extracted picture to the DCT 
circuit 240. The DCT circuit 240 performs DCT for the 
supplied picture and outputs the result as a motion compen- 
sation prediction value in the DCT coefficient domain. 

In this manner, the motion compensation prediction value 
in the DCT coefficient domain can be obtained. 

The foregoing is the explanation of the encoding appa- 
ratus. The decoding apparatus will be described below. 

FIG. 4 is a block diagram of the decoding apparatus 
according to the first embodiment of the present invention. 

According to the present decoding apparatus, a buffer 162 
on the enhancement layer side receives a coded bit stream 
sent from the encoding apparatus. The output terminal of the 
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buffer 162 is connected to variable-length encoder 142 via a 
demultiplexer 152 the output terminal of which is connected 
to a dequantizer 172. The output terminal of the dequantizer 
172 is connected to a motion compensation prediction 
section 202 via adders 181 and 192. The output terminal of 
the motion compensation prediction section 202 is con- 
nected an adder 192 via a selector 300. 

The buffer 163 on the base layer side receives the encoded 
bit stream 23 sent from the encoding apparatus. The output 
terminal of the buffer 163 is connected to a variable-length 
decoder 143 via a segmentation circuit 153. The output 
terminal of the variable-length decoder 143 is connected to 
a motion compensation prediction section 203 via a dequan- 
tizer 173 and an adder 193 and to the switch control terminal 
of a selector 300 via a binarizing circuit 310. The output 
terminal of the motion compensation prediction section 203 
is connected to an adder 193 and the selector 300. 

The input buffer 162, the demultiplexer 152, the variable- 
length decoder 142, the dequantizer 172, the adders 181 and 
192, the selector 300, and the motion compensation predic- 
tion section 202 constitute an enhancement layer. The input 
buffer 163, the demultiplexer 153, the variable -length 
decoder 143, the dequantizer 173, the adder 193, the bina- 
rizing circuit 310, and the motion compensation prediction 
section 203 constitute a base layer. 

The input buffer 162 in the enhancement layer receives 
and temporarily holds an encoded multiplexed bit stream 22 
in the enhancement layer. The demultiplexer 152 demulti- 
plexes the bit stream 22 obtained via the input buffer 162, 
i.e., demultiplexes the multiplexed signal into the original 
signals, thereby restoring encoded information of side infor- 
mation and encoded information of a difference value output 
signal EC of a picture. 

The variable-length decoder 142 performs variable-length 
decoding for the encoded signals demultiplexed by the 
demultiplexer 152 to thereby restore the original side infor- 
mation and the difference value output signal EC of the 
picture. On the basis of the information of a quantization 
scale Q_scale of the restored side information, the dequan- 
tizer 172 dequantizes the difference value output signal EC 
of the picture from the variable -length decoder 142 and 
outputs the dequantized signal. The adder 181 adds the 
dequantized signal and the dequantized output from the 
dequantizer 173 for the base layer. 

The adder 192 adds the output from the adder 181 and the 
output EP from the selector 300 and outputs the sum. The 
motion compensation prediction section 202 receives the 
output from the adder 192 and the decoded difference value 
output signal EC of the picture which is the output from the 
variable-length decoder 143 for the base layer and obtains a 
motion compensation prediction value EMC. The output 
motion compensation prediction value EMC from the 
motion compensation prediction section 202 is used as an 
enhancement layer output 40 and as one input to the selector 
300, 

The selector 300 receives the output (motion compensa- 
tion prediction value EMC) from the motion compensation 
prediction section 202 for the enhancement layer and the 
output from the motion compensation prediction section 203 
for the base layer. In accordance with the output from the 
binarizing circuit 310, the selector 300 selectively outputs 
one of these two inputs. 

The input buffer 163 receives and temporarily holds an 
encoded and multiplexed bit stream 23 for the base layer. 
The demultiplexer 153 demultiplexes the bit stream 23 
obtained via the input buffer 163, i.e., demultiplexes the 
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multiplexed signal into the original signals, thereby restor- 
ing encoded information of the side information and 
encoded information of the difference value output signal 
EC of the picture. 

5 The variable-length decoder 143 performs variable-length 
decoding for the encoded signals demultiplexed by the 
demultiplexer 153 to thereby restore the original side infor- 
mation and the difference value output signal EC of the 
picture. On the basis of the information of the quantization 

10 scale Q__scale of the restored side information, the dequan- 
tizer 173 dequantizes the difference value output signal EC 
of the picture from the variable-length decoder 143 and 
supplies the dequantized signal to the adders 181 and 193. 
The adder 193 adds the dequantized signal and the motion 

is compensation prediction value EMC supplied from the 
motion compensation prediction section 203 for the base 
layer. 

The motion compensation prediction section 203 receives 
the output from the adder 193 and the motion compensation 

20 prediction value EMC, which is the output of an immedi- 
ately previous frame from the section 203, and obtains the 
motion compensation prediction value EMC of the current 
frame. The output motion compensation prediction value 
EMC from the motion compensation prediction section 203 

25 is used as an output 41 of the base layer and as the other 
input to the selector 300. 

The operation of the decoding apparatus with the above 
configuration will be described below. In this apparatus, the 
base layer bit stream 23 is supplied to the input buffer 163 
and the enhancement layer bit stream 22 is supplied to the 
input buffer 162. 

The input base layer bit stream 23 is stored in the input 
buffer 163 and supplied to the demultiplexer 153. The 

35 demultiplexer 153 demultiplexes the signal in accordance 
with the type of the signal. That is, the bit stream 23 is 
formed by multiplexing signals of the side information such 
as the quantized value of a transform coefficient, the motion 
vector, and the quantization scale. Upon receiving the bit 

4Q stream 23, therefore, the demultiplexer 153 demultiplexes 
the bit stream into the original codes such as the quantized 
value of the transform coefficient, the motion vector, and the 
quantization scale Q_scale in the side information. 
The codes demultiplexed by the demultiplexer 153 are 

45 supplied to the variable-length decoder 143 and decoded 
into signals of the quantized value of the transform 
coefficient, the motion vector, and the quantization scale 
Q_scale. Of the decoded signals, the motion vector is 
supplied to the motion compensation prediction section 203, 

50 and the quantized value of the transform coefficient and the 
quantization scale Q_scale are supplied to the dequantizer 
173. The dequantizer 173 dequantizes the quantized value of 
the transform coefficient in accordance with the quantization 
scale Q_scale and supplies the dequantized transform coef- 

ss ficient to the adder 193. 

The adder 193 adds the dequantized transform coefficient 
and the motion compensation prediction value in the trans- 
form coefficient domain supplied from the motion compen- 
sation prediction section 203, thereby calculating the recon- 

60 structed value in the transform coefficient domain. 

This reconstructed value is supplied to the motion com- 
pensation prediction section 203. The configuration of the 
motion compensation prediction section 203 is as shown in 
FIG. 3B. That is, the reconstructed value supplied from the 

65 adder 193 is inversely orthogonally transformed by an IDCT 
circuit 210 in the motion compensation prediction section 
203 and output as the reconstructed picture signal 41. The 
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signal is also stored in a frame memory 220 in the motion ficient output EMC from the motion compensation predic- 

compensation prediction section 203. tion section 202. If the value is "1", the selector 300 selects 

In the motion compensation prediction section 203, on the the transform coefficient output BMC from the motion 

basis of the supplied motion vector described above, a compensation prediction section 203. 

motion compensation circuit 230 extracts a picture in a $ xh at [ S) me binarizing circuit 310 outputs "0" when the 

position indicated by the motion vector in units of blocks vahie of ^ output BQ from the variable-length decoder 143 

from the picture signals (reference pictures) stored in the ^ the base k is K(r and outpuls « r when me value not 

frame memory220. A DCT circuit 240 performs orthogonal ^ Therefore, the selector 300 is made to select EMC when 

transform (DCT) for the extracted pictare and outputs the binarizing circuit 310 is "0" and BMC 

result as a transform coefficient output BMC to the adder 1fl , * ♦ . • «i» ^ *i *u * t 

193 and the selector 300 when ^ output 1S 1 * Co^uenUy, the transform coef- 

w , ,. it _ , . , ... , „ . ficient output EMC from the motion compensation predic- 

Meanwhile, the enhancement layer bit stream 22 is sup- r . , . . r . i-Ij 

plied to the enhancement layer. This bit stream 22 is stored 10D f ctlon 2 * 2m , the enhl " ce ™ nl ^ B »K^. to a 

in the enhancement layer input buffer 162 and supplied to transfonn coefficient m a position where the output BQ from 

the demultiplexer 152 lri e variable-length decoder 143 is 0 , and the transform 

Tlie demultiplexer 152 demultiplexes the bit stream 22. 15 ^efficient output BMC from the motion compensation 

That is, the bit stream 22 is formed by multiplexing signals Prediction section 203 in the base layer is applied to a 

of the side information such as the quantized value of a transfer* » coefficient in a position where the output BQ from 

transform coefficient, the motion vector, and the quantiza- the variable -length decoder 143 is not 0* . 

tion scale Q_scale. Upon receiving the bit stream 22, The output from the variable-length decoder 143 in the 

therefore, the demultiplexer 152 demultiplexes the bit 20 base laver contains the motion compensation prediction 

stream into the original codes such as the quantized value of error signal and the motion vector obtained on the encoding 

the transform coefficient, the motion vector, and the quan- side - Wnen the motion compensation prediction error signal 

tization scale Q„scale. m & me motion vector are supplied to the motion compen- 

The codes demultiplexed by the demultiplexer 152 are „ sation prediction section 203, the motion compensation 

supplied to the variable-length decoder 142 and decoded prediction section 203 obtains the motion compensation 

into signals of the quantized value of the transform prediction error between the picture of the immediately 

coefficient, the motion vector, and the like. Of the decoded previous frame and the current picture, 

signals, the motion vector is supplied to the motion com- Meanwhile, the binarizing circuit 310 receives the 

pensation prediction section 202, and the quantized value of 3Q decoded base layer motion compensation prediction value 

the transform coefficient and the quantization scale Q_scale signal from the variable-length decoder 143. If the signal 

are supplied to the dequantizer 172. The dequantizer 172 value is "0", the binarizing circuit 310 outputs "0" to the 

dequantizes the quantized value of the transform coefficient selector 300. If the signal value is not "0", the binarizing 

in correspondence with the quantization scale Q__scale and circuit 310 outputs "1" to the selector 300. 

supplies the dequantized transform coefficient to the adder 35 If the output from the binarizing circuit 310 is "0", the 

181. The dequantized value is added to a dequantized value selector 300 selects the output EMC with a smaller encoding 

31 of the base layer supplied from the dequantizer 173, and deviation from the enhancement layer motion compensation 

the sum is supplied to the adder 192. prediction section 203. If the output from the binarizing 

The adder 193 adds the output from the adder 181 and a circuit 310 is "1", the selector 300 selects the transform 

signal EP supplied from the selector 300 to thereby calculate 40 coefficient output BMC with a larger encoding deviation 

the reconstructed value in the transform coefficient domain. from the base layer motion compensation prediction section 

This reconstructed value is supplied to the motion compen- 202. 

sation prediction section 202. The configuration of the Eventually, if the DCT coefficient error obtained by the 

motion compensation prediction section 202 is as shown in base layer motion compensation prediction is "0", the output 

FIG. 3B. That is, the reconstructed value supplied from the 45 from the motion compensation prediction section 202 which 

adder 193 is inversely orthogonally transformed by an IDCT is the reconstructed value of the transform coefficient output 

circuit 210 in the motion compensation prediction section EMC from the enhancement layer motion compensation 

202 and output as a reconstructed picture signal 40. The prediction section 200 is selected. If the error is "1", the 

signal is also stored in a frame memory 220 in the motion output from the motion compensation prediction section 203 

compensation prediction section 202 . 50 which is the reconstructed value of the transform coefficient 

In the motion compensation prediction section 202, on the output BMC from the base layer motion compensation 

basis of the supplied motion vector described above a prediction section 201 is selected. 

motion compensation circuit 230 extracts a picture in a This processing is analogous to the processing in the 

position indicated by the motion vector in units of blocks encoding apparatus. Accordingly, as the transform coeffi- 

from the picture signals (reference pictures) stored in the ss dent output of motion compensation prediction in the 

frame memory 220. A DCT circuit 240 performs orthogonal enhancement layer, as in the selection done on the encoding 

transform (DCT) for the extracted picture and outputs the side, an output for the base layer is used in a portion where 

result as a transform coefficient output BMC to the adder motion compensation prediction is incorrect, and an output 

193 and the selector 300. for the enhancement layer with a smaller encoding deviation 

The selector 300 receives the decision result from the to is used in a portion where the prediction is correct, 

binarizing circuit 310 and selects one of BMC and EMC. Consequently, following this switching the encoding appa- 

That is, the binarizing circuit 310 receives an output BQ ratus can smoothly reconstruct pictures, 

from the variable- length decoder 143 and decides whether In the first embodiment described above, each frame of a 

the value is "0", This decision result is supplied to the video picture is divided into matrix blocks each having a 

selector 300. 65 predetermined number (NxN) of pixels and orthogonally 

If the value of the output BQ from the variable-length transformed to obtain transform coefficients of individual 

decoder 143 is "0", the selector selects the transform coef- spacial frequency bands. For each of the NxN transform 
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coefficients thus obtained, motion compensation is per- tion for a picture in the interest region part on the basis of 
formed in the domain of the transform coefficient in upper a reconstructed value in a transform coefficient domain 
and lower layers. When motion compensation is to be supplied from an adder 190 and a reconstructed value in an 
performed in this video encoding, whether motion compen- immediately previous frame. The motion compensation pre- 
sation prediction is correct is checked on the basis of an 5 diction section 601 refers to the alpha-map and performs 
already decoded and quantized value in the lower layer (base motion compensation prediction for the picture in the inter- 
layer). If the moUon compensation prediction is correct, the est region parl on lhe basis of a reconstructed value in the 
upper layer (enhancement layer) is encoded by using a transform coefficient domain supplied from an adder 191 
motion compensation prediction value with a smaller encod- and ^ reconstructed value in me imme diately previous 
ing deviation obtained for the upper layer. If the motion f rame 
compensation prediction is incorrect, the upper layer is 

encoded by using a motion compensation prediction value motion vector detector 510 refers to the alpha-map 

obtained for the lower layer (base layer) and having a larger and detects a motion vector in the picture in the interest 

encoding deviation than that for the enhancement layer. This re e ion P art from me pictures stored in the frame memory 

improves the coding efficiency of a coefficient the motion ^00. 

compensation prediction for which is incorrect and thereby 15 The multiplexer 155 is provided for the base layer. The 

realizes an encoding system capable of encoding with little multiplexer 155 multiplexes a variable-length code of a 

decrease in the coding efficiency. prediction error signal from a variable -length encoder 141, 

lhe foregoing is an embodiment in which a whole video a variable-length code of side information such as mode 
picture is efficiently encoded in the scalable encoding information containing quantization scale information, a 
method. An embodiment in which the present invention is 20 variable-length code of a motion vector, and a code (alpha- 
applied to arbitrary-shape picture encoding by which a code) of a separately supplied alpha-map, and supplies the 
background and an object in a video picture are separately multiplexed signal to the output buffer 161. 
encoded will be described below. This second embodiment In this apparatus with the above configuration, an input 
of the present invention will be described with reference to picture signal 10 is temporarily stored in the frame memory 
FIGS. 5, 6A, 6B, and 7. In this embodiment, the technique 2 700 and read out to the arbitrary shape orthogonal transform 
of the first embodiment is applied to pictures having arbi- circuit 101 and the motion vector detector 510. In addition 
trary shapes represented by alpha-map signals. to the picture signal 10, an alpha-map signal 50 which is a 

FTG. 5 shows an encoding apparatus of the present map information signal for distinguishing a background 

invention as the second embodiment. The basic configura- 3Q portion from an object portion in a picture is input to the 

tion of this encoding apparatus is the same as the encoding arbitrary shape orthogonal transform circuit 101. 

apparatus explained in the first embodiment. Accordingly, This alpha-map signal can be acquired by applying, e.g., 

the same reference numerals as in the configurations shown a chromakey technique. For example, in the case of an 

in FIGS. 1 and 4 denote the same parts and a detailed alpha-map for distinguishing a person (object) from a 

description thereof will be omitted. 35 background, the image of the person is taken by the chro- 

This configuration differs from FIG. 1 in eight points; that makey technique and binarized to obtain a bit-map picture in 

is, an arbitrary shape orthogonal transform circuit 101 is which the person image region is "1" and the "background 

provided instead of the DCT circuit 100, inputs are received region is "0" This picture can be used as an alpha-map. 

via a frame memory 700, an encoding controller 420 is The arbitrary-shape orthogonal transform circuit 101 

provided instead of the encoding controller 400, an encoding 4Q refers to this alpha-map signal, checks where the object 

controller 430 is provided instead of the encoding controller region of the picture is, divides the rectangle region includ- 

410, a motion compensation prediction section 600 is pro- ing the object region into square blocks each consisting of 

vided instead of the motion compensation prediction section NxN pixels, and orthogonally transforms each block to 

200 for an enhancement layer, a motion compensation obtain NxN transform coefficients. As a technique to 

prediction section 601 is provided instead of the motion 45 orthogonally transform an arbitrary-shape region of a picture 

compensation prediction section 201, a motion vector detec- by using an alpha-map, it is only necessary to use a 

tor 510 is provided instead of the motion vector detector technique established by the present inventors and disclosed 

500, and a multiplexer 155 is provided instead of the in above-mentioned Japanese Patent Application No. 

multiplexer 151. 7-97073 which is already filed. 

The frame memory 700 temporarily holds an input picture 50 Although the explanation of the operation of the encoding 

signal in units of frames. The arbitrary shape orthogonal apparatus according to the second embodiment has not 

transform circuit 101 extracts an object region from the finished, processing of decreasing a step will be described 

pictures stored in the frame memory 700 by referring to a below as a modification. 

separately supplied alpha-map. The circuit 101 divides the [ n the method of the second embodiment described above, 
rectangle region including the object region into blocks of a 55 the average value of the object is arranged in the back- 
predetermined pixel size and performs DCT for each block. ground. In addition to this processing, if pixel values of the 
The encoding controller 420 refers to the alpha-map and object are compressed around the average value by a pre- 
generates a quantization scale Q_scale, which gives an determined scaling coefficient, the step of a pixel value in the 
enhancement layer optimum quantization scale to output boundary between the object and the background can be 
buffer capacity information from an output buffer 160, and $0 decreased. Details of this processing will be described 
side information. The encoding circuit 430 refers to the below. 

alpha- map and generates a quantization scale Q_scale, To decrease the step of a pixel value in the boundary 

which gives a base layer optimum quantization scale and between the object and the background, pixel values of the 

side information to output buffer capacity information from object are compressed around the average value by a pre- 

an output buffer 161, and. side information. 65 determined scaling coefficient. Examples of the method are 

The motion compensation prediction section 600 refers to illustrated in FIGS. 16 and 17. Although actual pictures are 

the alpha-map and performs motion compensation predic- two-dimensional signals, one-dimensional signals are 
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shown for simplicity. In these drawings, a pixel value is 
plotted on the ordinate, and a pixel row is plotted on the 
abscissa. The left-hand side of the position of a pixel row e 
is an object region, and the right-hand side is a background 
region. FIG. 16 shows a state in which a pixel value average 5 
value a of the object is arranged in the background portion. 

FIG. 17 shows the result of compression around the pixel 
value average value a. Assuming the luminance before the 
compression is x and the luminance after the compression is 
y, the luminance y after the compression can be represented 10 
by where c is a constant between "0" and "1". 

y=cx(x~a)Hi 

Compression can be performed for all pixels xl to x23 in 
an object shown in FIG. 18. However, the step can also be 15 
decreased by compressing only the pixels xl to x8 in the 
object close to the boundary to the background portion. 
Although an additional arithmetic operation is necessary to 
check whether a pixel is close to the boundary, the method 
has the advantage that the pixels x9 to x23 in the object not 20 
in contact with the boundary to the background are kept 
unchanged. 

In this modification, it is decided that, of pixels in the 
object, those in contact with the background in any of the 
upper, lower, left, and right portions are pixels close to the 25 
boundary. 

The foregoing is the modification by which pixel values 
in the object are compressed around the average value by a 
predetermined scaling coefficient in order to decrease the 
step of a pixel value in the boundary between the object and 30 
the background. However, the step of a pixel value in the 
boundary between the object and the background can also be 
decreased by processing the background portion. This modi- 
fication will be described below. 

FIG. 19 shows the modification of processing the back- 35 
ground portion. In this modification, of pixels in the 
background, the values of pixels close to the object region 
are so altered as to decrease the step. A practical example is 
shown in FIG. 20. Referring to FIG. 20, xn (n-1 to 23) 
indicates a pixel value in the object and an (n=l to 41) 40 
indicates a pixel value in the background. Before the 
processing, all pixel values an in the background are equal 
to a pixel average value a. 

First, the values of background pixels al to a9 in contact 
with pixels in the object region in any of the upper, lower, 45 
left, and right portions are replaced with average values of 
the values of the contacting object pixels xn and their own 
pixel values. For example, the background pixel al is 
replaced with "(al+xl)/2", and the background pixel a3 is 
replaced with "(a3+x2+x3)/3". 50 

Subsequently, the background pixels alO to al7 in contact 
with the background pixels al to a9 are similarly replaced. 
As an example, alO is replaced with "(al0+al)/2". As the 
background pixel al, the previously replaced value is used. 

Likewise, the background pixels al8 to a24 are sequen- 55 
tially replaced. As a consequence, the steps of pixel values 
in the boundary between the object and the background are 
decreased. 

When pixel values are altered as described above, the 
average value of the block changes, Therefore, pixel values 60 
in the background portion can also be corrected so that the 
original block average value remains unchanged. This cor- 
rection is to add or subtract a predetermined value to or from 
all pixels in the background or to alter pixel values in the 
background far from the boundary in a direction opposite to 65 
the luminance direction in which pixel values close to the 
boundary are altered.' 



When pixel values in the object are altered, a picture close 
to an input picture can be obtained by restoring the portion 
before the alteration after the picture is decoded. For 
example, in the above method in which compression is 
performed around the average value, alteration is done as 
follows assuming that a decoded value of a pixel com- 
pressed in encoding is yd and a pixel value after the 
alteration is xd: where ad is the average value of the object 
or the background of the decoded picture or the average 
value of the whole block. Although yd often takes a value 
somewhat different from y due to an encoding/decoding 
distortion, xd close to x can be obtained by this alteration. 

According to the encoding apparatus of the above 
modifications, as shown in FIG. 21, a picture signal is input 
to the input terminal of a switch 2003 one output terminal of 
which is connected to a switch 2009, a compressor 2010 and 
an average circuit 2011. The output terminal of the com- 
pressor 2010 is connected to the other input terminal of the 
switch 2009. The output terminal of the switch 2005 is 
connected to the switch 2005. The output terminal of the 
average circuit 2011 is connected to the compressor 2010 
and the switch 2005. 

A decision circuit 2004 receives an alpha-map signal and 
the output terminal of the decision circuit 2004 is connected 
to the control terminal of the switch 2009. An encoder 2006 
receives the alpha-map signal and a DCT circuit 2017 
receives a signal selected by the switch 2005. 

En the encoding apparatus constructed as described above, 
the encoder 2006 encodes an externally input alpha-map 
signal 2001. The switch 2003 receives the alpha -map signal 

2001 and a picture signal 2002. On the basis of the alpha- 
map signal 2001, the switch 2003 divides the picture signal 

2002 into an object picture 2007 and a background picture 
2008 and discards the background picture 2008. "Discard- 
ing" does not necessarily mean "sending the picture to some 
other place" but simply means that the picture is left unused 
after that. 

The decision circuit 2004 decides on the basis of the 
alpha-map signal 2001 whether an interest pixel which is a 
pixel currently being processed in the object picture 2007 
supplied via the switch 2003 is in contact with the back- 
ground. The decision circuit 2004 supplies a decision result 
2013 to the switch 2009. 

The average circuit 2011 calculates an average value 2012 
of the object picture 2007 supplied via the switch 2003 and 
outputs the average value 2012 to the compressor 2010 and 
the switch 2005. The compressor 2010 compresses the 
amplitude of the object picture 2007 around the average 
value 2012 to obtain a compressed picture 2014 and outputs 
the compressed picture 2014 to the switch 2009. 

The switch 2009 receives the compressed picture 2014 
and the object picture 2007 from the switch 2003 and refers 
to the decision result 2013 from the decision circuit 2004. If 
the interest pixel is a pixel in a portion in contact with the 
background, the switch 2009 selectively outputs the com- 
pressed picture 2014 as an encoded picture 2015 of the 
object. If the interest pixel is not in contact with the 
background, the switch 2009 selectively outputs the object 
picture 2007 as the encoded picture 2015 of the object. 

The switch 2005 receives the alpha- map signal 2001, the 
encoded picture 2015 of the object supplied from the switch 
2009, and the average value 2012 calculated by the average 
circuit 2011. On the basis of the input alpha-map signal 
2001, if the interest pixel which is a pixel currently being 
processed is the abject, the switch 2005 selectively outputs 
the encoded picture 2015 of the object as an encoded picture 
2016. If the interest pixel is the background, the switch 2005 
selectively outputs the average value 2012 as the encoded 
picture 2016. 
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Hie DCT circuit 2017 performs DOT for the output 
encoded picture 2016 from the switch 2005 and outputs a 
transform coefficient 2018. 

In the encoding apparatus with the above configuration, 
the alpha-map signal 2001 and the picture signal 2002 are 
externally input. The alpha-map signal 2011 is supplied to 
the switch 2003, the decision circuit 2004, the switch 2005, 
and the encoder 2006. The picture signal 2002 is supplied to 
the switch 2003. 

On the basis of the input alpha-map signal 2001, the 
switch 2003 divides the picture signal 2002 into the object 
picture 2007 and the background picture 2008 and discards 
the background picture 2008. As described previously, "dis- 
carding" does not necessarily mean "sending the picture to 
some other place" but simply means that the picture is left 
unused after that. 

The object picture 2007 separated by the switch 2003 is 
supplied to the switch 2009, the compressor 2010, and the 
average circuit 2011. The average circuit 2011 calculates the 
average value 2012 of the object picture and supplies the 
average value 2012 to the compressor 2010 and the switch 
2005. The compressor 2010 compresses the amplitude of the 
object picture 2007 around the average value 2012 and 
supplies the compressed picture 2014 obtained by this 
compression to the switch 2009. 

The decision circuit 2004 which has received the alpha- 
map signal 2001 decides whether an interest pixel which is 
a pixel currently being processed is in contact with the 
background, and supplies the decision result 2013 to the 
switch 2009. If the decision result 2013 from the decision 
circuit 2004 indicates that the interest pixel is in contact with 
the background, the switch 2009 selectively outputs the 
compressed picture 2014 as the encoded picture 2015 of the 
object. If the interest pixel is not in contact with the 
background, the switch 2009 selectively outputs the object 
picture 2007 as the encoded picture 2015 of the object. 

The output encoded picture 2015 of the object from the 
switch 2009 is supplied to the switch 2005. The switch 2005 
refers to the alpha-map signal 2001 and, if the interest pixel 
is the object, selectively outputs the encoded picture 2015 of 
the object as the encoded picture 2016. If the interest pixel 
is the background, the switch 2009 selectively outputs the 
average value 2012 as the encoded picture 2016. 

The encoded picture 2016 output from the switch 2005 is 
supplied to the DCT circuit 2017. The DCT circuit 2017 
performs DCT for the encoded picture 2016 to obtain the 
transform coefficient 2018 and outputs the transform coef- 
ficient 2018 to the outside. The alpha -map signal 2001 is 
encoded by the encoder 2006 and output to the outside as an 
alpha-code 2019. 

Note that there is another method in which an alpha-map 
is encoded before a picture is encoded and the decoded 
signal is input to the switches 2003 and 2005 and the 
decision circuit 2004. If a distortion occurs in encoding and 
decoding of an alpha-map, the alpha-map signals on the 
encoding and decoding sides can be made equal by this 
method. 

FIG. 22 shows a decoding apparatus as a counterpart of 
the encoding apparatus in FIG. 21. According to this decod- 
ing apparatus, a decoder 2020 receives an encoded alpha- 
map signal. The output terminal of the decoder 2020 is 
connected to control terminals of switches 2023 and 2025 
and a decision circuit 2024. An inverse DCT circuit 2021 
receives transform efficient 2018 of the encoded picture. The 
output terminal of the inverse DCT circuit 2021 is connected 
to a decompressor 2030 and an average circuit 2031. The 
output terminal of the decompressor 2030 is connected to a 
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switch 2029 together with the output terminal of the switch 
2023. The switch 2029 is connected to a switch 2036. 
In the decoding apparatus as described above, the decoder 

2020 receives the externally input alpha-code 2019, decodes 
5 the alpha-code 2019, and supplies a decoded alpha-map 

signal 2022 to the switch 2023, the decision circuit 2024, 
and the switch 2025. The inverse DCT circuit 2021 performs 
inverse DCT for the externally input transform coefficient 
2018 to decode a picture and supplies the picture obtained by 
10 the decoding, i.e., a decoded picture 2026, to the switch 
2023. 

The decision circuit 2024 decides on the basis of the 
alpha-map signal 2022 decoded by the decoder 2020 
whether an interest pixel in an object picture 2027 is in 

15 contact with the background. The decision circuit 2024 
outputs a decision result 2034 to the switch 2029. 

On the basis of the alpha-map signal 2022 decoded by the 
decoder 2020, the switch 2023 divides the decoded picture 
2026 supplied from the inverse DCT circuit 2021 into the 

20 object picture 2027 and a background picture 2028. The 
switch 2023 outputs the object picture 2027 to the switch 
2029, the decompressor 2030, and the average circuit 2031 
and discards the background picture 2028. 
The average circuit 2031 calculates an average value 2032 

25 of the object picture 2027 supplied from the switch 2023 and 
outputs the average value 2032 to the decompressor 2030. 
The decompressor 2030 expands the amplitude of the object 
picture 2027 around the average value 2032 to obtain an 
expanded picture 2033 and outputs the expanded picture 

30 2033 to the switch 2029. 

Of the object picture 2027 and the expanded picture 2033 
thus input, the expanded picture 2033 is selectively output as 
a decoded picture 2035 of the object from the switch 2029 
to the switch 2025, if the output decision result 2034 from 

35 the decision circuit 2033 indicates that the interest pixel is in 
contact with the background. If the interest pixel is not in 
contact with the background, the switch 2029 selectively 
outputs the object picture 2027 as the decoded picture 2035 
of the object to the switch 2025. 

40 The switch 2025 receives the decoded picture 2035 of the 
object and a signal 2037 which is separately input as the 
background, and refers to the alpha-map signal 2022. If the 
interest pixel is the object, the switch 2025 selectively 
outputs the decoded picture 2035 of the object as a recon- 

45 structed picture 2036 to the outside. If the interest pixel is the 
background, the switch 2025 selectively outputs the signal 
2037 as the reconstructed picture 2036 to the outside. 

In the decoding apparatus with the above configuration, 
the alpha-code 2019 and the transform coefficient 2018 are 

50 externally input. The alpha-code 2019 is supplied to the 
decoder 2020, and the transform coefficient 2018 is supplied 
to the inverse DCT circuit 2021. 

The decoder 2020 decodes the alpha -map signal 2022 and 
outputs the decoded signal to the switch 2023, the decision 

55 circuit 2024, and the switch 2025. The inverse DCT circuit 

2021 decodes the picture and supplies the decoded picture 
2026 to the switch 2023. 

On the basis of the alpha-map signal 2022 decoded by the 
decoder 2020, the switch 2023 divides the decoded picture 
60 2026 into the object picture 2027 and the background picture 
2028 and discards the background picture 2028. The object 
picture 2027 separated by the switch 2023 is supplied to the 
switch 2029, the expander 2030, and the average circuit 
2031. 

65 The average circuit 2031 calculates the average value 
2032 of the object picture 2027 and supplies the average 
value 2032 to the expander 2030. 
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The expander 2030 expands the amplitude of the object 
picture 2027 around the average value 2032 and supplies the 
expanded picture 2033 thus obtained to the switch 2029. 

The decision circuit 2024 decides whether an interest 
pixel in the object picture 2027 is in contact with the 
background and supplies the decision result 2034 to the 
switch 2029. If the decision result 2034 indicates that the 
interest pixel is in contact with the background, the switch 
2029 selectively outputs the expanded picture 2033 as the 
decoded picture 2035 of the object. If the interest pixel is not 
in contact with the background, the switch 2029 selectively 
outputs the object picture 2027 as the decoded picture 2035 
of the object. 

The output decoded picture 2035 of the object from the 
switch 2029 is supplied to the switch 2025. The switch 2025 
refers to the alpha-map signal 2022 and, if the interest pixel 
which is a pixel currently being processed is the object, 
outputs the decoded picture 2035 of the object as the 
reconstructed picture 2036 to the outside. If the interest pixel 
is the background, the switch 2029 selectively outputs the 
signal 2037, which is separately input as the background, as 
the reconstructed picture 2036. Note that a reconstructed 
signal of a background picture which is separately encoded 
or a predetermined pattern is used as the background signal 
2037. 

The foregoing are examples of the processing of decreas- 
ing the step. 

The examples of the processing of decreasing the step 
have been described above, and a description will return to 
the subject of the second embodiment. 

As already described above, the arbitrary shape orthogo- 
nal transform circuit 101 refers to the alpha-map signal, 
checks the interest region of the picture, divides the interest 
region of the picture into square blocks each consisting of 
NxN pixels, and orthogonally transforms each block to 
obtain NxN transform coefficients. 

For a block containing the boundary between the object 
and the background, a transform coefficient for the object 
and a transform coefficient for the background are calcu- 
lated. These transform coefficients are supplied to adders 
110 and 111 for the enhancement layer and the base layer, 
respectively. 

Upon receiving the transform coefficient, the adder 111 of 
the base layer calculates a prediction error signal between 
this transform coefficient and a motion compensation pre- 
diction value (BMC) which is converted into an orthogonal 
transform coefficient and supplied from the motion compen- 
sation prediction section 601. The adder 111 supplies the 
result to a quantizer 131. The quantizer 131 quantizes the 
prediction error signal in accordance with the quantization 
scale Q_scale supplied from the encoding controller 430 
and supplies the quantized signal to the variable-length 
encoder 141 and a dequantizer 171. 

The variable-length encoder 141 performs variable-length 
encoding for the quantized prediction error signal. The 
variable-length encoder 141 also performs variable-length 
encoding for the side information such as the mode infor- 
mation containing the quantization scale information sup- 
plied from the encoding controller 430 and the motion 
vector supplied from the motion vector detector 510. 

These variable-length codes obtained by the variable- 
length encoder 141 are supplied to the multiplexer 155. The 
multiplexer 155 multiplexes these variable-length codes 
together with an alpha-code 55 which is encoded and 
supplied to the multiplexer 155. The multiplexer 155 outputs 
the multiplexed signal to the output buffer 161. 

The output buffer 161 outputs the multiplexed signal as an 
encoded bit stream 21 to a transmission line or a storage 
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medium and also feeds the capacity of the buffer back to the 
encoding controller 430. In accordance with this buffer 
capacity, the encoding controller 430 generates the optimum 
quantization scale Q_scale. 

The quantized value of the prediction error signal sup- 
plied to the dequantizer 171 is dequantized by the dequan- 
tizer 171. The adder 191 adds the dequantized value to the 
motion compensation prediction value (BMC), thereby cal- 
culating a reconstructed value in the transform coefficient 
domain. The reconstructed value is supplied to the motion 
compensation prediction section 601. 

In the enhancement layer, a selector 300 performs selec- 
tion on the basis of the value of an output (BQ) from the 
quantizer 131 in the base layer. That is, the selector 300 
adaptively switches the output (EMC) from the motion 
compensation prediction section 600 in the enhancement 
layer and the output (BMC) from the motion compensation 
prediction section 601 in the base layer for each transform 
coefficient by using a method to be described later and 
outputs the selected input as EP. 

More specifically, the output (BQ) from the quantizer 131 
in the base layer is supplied to a binarizing circuit 310. If the 
value of BQ is "0", the binarizing circuit 310 outputs "0" to 
the selector 300. If the value is not "0", the binarizing circuit 
310 outputs "1" to the selector 300. 

If the output from the binarizing circuit 310 is "0", the 
selector selectively outputs EMC as EP. If the output is 
BMC, the selector 300 selectively outputs BMC as EP. 
Consequently, the transform coefficient output EMC from 
the motion compensation prediction section 600 in the 
enhancement layer is applied to a transform coefficient in a 
position where the output BQ from the quantizer 131 is "0", 
and the transform coefficient output BMC from the motion 
compensation prediction section 601 in the base layer is 
applied to a transform coefficient in a position where the 
output BQ from the quantizer 131 is not "0". 

The quantizer 131 in the base layer receives and quantizes 
the output from the adder 111. The adder 111 receives the 
output from the arbitrary-shape orthogonal transform circuit 
101 and the motion compensation prediction value obtained 
from a picture in an immediately previous frame by the 
motion compensation prediction section 601 and calculates 
the difference between them. Therefore, if the motion com- 
pensation prediction value is correct, the difference between 
the two values output from the adder 111 is "0". 

Accordingly, of the quantized values as the output BQ 
from the quantizer 131 in the base layer, coefficients whose 
values are not "0" are coefficients representing that the 
motion compensation prediction is incorrect. 

If the motion compensation prediction section 600 per- 
forms motion compensation prediction by using the same 
motion vector as in the base layer supplied from the motion 
vector detector 510, it is estimated that motion compensa- 
tion prediction for coefficients in the enhancement layer in 
the same positions as in the base layer is incorrect. 

For these coefficients, therefore, the selector 300 selects 
BMC as the output from the motion compensation predic- 
tion section 601 for the base layer. 

On the other hand, since it is estimated that motion 
compensation for other coefficients is correct, the selector 
300 selects the prediction value in the enhancement layer 
with a smaller encoding distortion. Consequently, the signal 
EC encoded in the enhancement layer is the quantized error 
signal of the base layer if motion compensation prediction is 
incorrect, and is the motion compensation prediction error 
signal of the enhancement layer if motion compensation 
prediction is correct. This improves the efficiency of encod- 
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ing for coefficients the motion compensation prediction for 
which is incorrect. 

The adder 110 in the enhancement layer calculates a 
prediction error signal between the transform coefficient of 
the input picture supplied from the arbitrary-shape orthogo- s 
nal transform circuit 101 and the output (EP) from the 
selector 300 and supplies the result to an adder 121. 

The adder 121 receives a dequantized value 30 of BQ 
supplied from the dequantizer 171. Accordingly, the adder 
121 calculates the difference EC between the value 30 and 10 
the output from the adder 110 and supplies, the difference 
EC as a prediction error signal to a quantizer 130. 

The quantizer 130 quantizes the signal EC by using the 
quantization scale Q_scale supplied from the encoding 
controller 420 in accordance with the buffer capacity. The 15 
quantizer 130 supplies the quantized output to a variable- 
length encoder 140 and a dequantizer 170. The variable- 
length encoder 140 separately performs variable-length 
encoding for the quantized prediction error signal and the 
side information such as the mode information supplied 20 
from the encoding controller 420 and supplies the variable- 
length codes to a multiplexer 150. 

The multiplexer 150 multiplexes these variable-length 
codes and supplies the multiplexed signal to the output 
buffer 160. The output buffer 160 temporarily holds the 25 
signal and outputs the signal as an encoded bit stream 20 to 
a transmission line or a storage medium. Also, the output 
buffer 160 feeds the capacity of the buffer to the encoding 
controller 420. Upon receiving the buffer capacity, the 
encoding controller 420 generates the optimum quantization 30 
scale Q_scale corresponding to the capacity and supplies 
the quantization scale Q_scale to the quantizer 130 and the 
variable-length encoder 140. 

The quantized value supplied from the quantizer 130 to 
the dequantizer 170 is dequantized. An adder 180 adds the 35 
dequantized value to the output 30 supplied from the 
dequantizer 171 in the base layer, thereby reconstructing the 
prediction error signal. 

The adder 190 adds the prediction error signal recon- 
structed by the . adder 180 to the motion compensation 40 
prediction value (EP) supplied from the selector 300, cal- 
culating a reconstructed value in the transform coefficient 
domain. The reconstructed value is supplied to the motion 
compensation prediction section 600. 

FIG. 6 A is a block diagram of the motion compensation 45 
prediction sections 600 and 601. Each of the motion com- 
pensation prediction sections 600 and 601 comprises an 
arbitrary -shape inverse orthogonal transform circuit 610, a 
frame memory 620, a motion compensation circuit 630, and 
an arbitrary shape orthogonal transform circuit 640. The 50 
arbitrary shape inverse orthogonal transform circuit 610 
inversely orthogonally transforms a reconstructed picture 
signal as an input signal in accordance with an alpha-map 
signal. The frame memory 620 temporarily holds the 
inversely orthogonally transformed signal in units of frames. 55 
The motion compensation circuit 630 receives the informa- 
tion of a motion vector, extracts a picture in a position 
indicated by the motion vector in units of frames, and 
supplies the extracted picture to the arbitrary-shape orthogo- 
nal transform circuit 640. The arbitrary-shape orthogonal 60 
transform circuit 640 orthogonally transforms the supplied 
picture in accordance with the alpha-map signal. In other 
words, the arbitrary-shape orthogonal transform circuit 640 
orthogonally transforms the motion compensation predic- 
tion value of an arbitrary shape, thereby obtaining a motion 65 
compensation prediction value in a transform coefficient 
domain. 
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In this configuration, a reconstructed value in a transform 
coefficient domain is supplied to the motion compensation 
prediction sections 600 and 601. The arbitrary-shape inverse 
orthogonal transform circuit 610 in this motion compensa- 
tion prediction section inversely transforms the recon- 
structed value into a reconstructed picture signal in accor- 
dance with the alpha-map signal 50 which is separately 
supplied. The reconstructed picture signal is stored in the 
frame memory 620. 

Of the reference pictures stored in the frame memory 620, 
a picture in a position indicated by the motion vector is 
extracted by the motion compensation circuit 630 in the 
motion compensation prediction section in units of blocks, 
and the extracted picture is supplied to the arbitrary shape 
orthogonal transform circuit 640 in the motion compensa- 
tion prediction section. Upon receiving the blocks of the 
picture, the arbitrary shape orthogonal transform circuit 640 
orthogonally transforms the picture blocks in accordance 
with the externally supplied alpha-map signal 50, thereby 
orthogonally transforming the motion compensation predic- 
tion value of an arbitrary shape. Consequently, the arbitrary 
shape orthogonal transform circuit 640 can calculate and 
output the motion compensation prediction value in the 
transform coefficient domain. In a block containing the 
boundary between the object and the background, transform 
coefficients of both the object and the background are 
calculated. 

From the reconstructed value in the transform coefficient 
domain, the motion compensation prediction sections 600 
and 601 calculate the motion compensation prediction val- 
ues EMC and BMC in the transform coefficient domain and 
supplies the values to the selector 300. 

The foregoing is the explanation of the encoding appa- 
ratus of the second embodiment. The decoding apparatus of 
the second embodiment will be described below. 

FIG. 7 is a block diagram of the decoding apparatus of the 
present invention. 

This configuration differs from FIG. 4 in three points; that 
is, a motion compensation prediction section 602 is provided 
instead of the motion compensation prediction section 201, 
a motion compensation prediction section 603 is provided 
instead of the motion compensation prediction section 203, 
and a demultiplexer 157 is provided instead of the demul- 
tiplexer 153. 

Each of the motion compensation prediction sections 602 
and 603 performs motion compensation prediction by refer- 
ring to an alpha-map signal. The demultiplexer 153 demul- 
tiplexes a quantized value of a transform coefficient, a 
motion vector, and side information such as a quantization 
scale and transfers the demultiplexed signals to the variable- 
length decoder 143. The demultiplexer 157 additionally has 
a function of demultiplexing an alpha-code and transferring 
the demultiplexed codes to an alpha-map decoding appara- 
tus (not shown). 

In this configuration, a base layer bit stream 23 which is 
formed by encoding and multiplexing a quantized value of 
a transform coefficient, a motion vector, side information 
such as a quantization scale, and an alpha-code is input to the 
input stage of the base layer. This bit stream 23 is stored in 
an input buffer 167 and supplied to the demultiplexer 157. 

The demultiplexer 157 demultiplexes the bit stream into 
the quantized value of the transform coefficient, the motion 
vector, the side information, and the alpha-code. Of these 
demultiplexed codes, the quantized value of the transform 
coefficient, the motion vector, and the side information are 
supplied to a variable -length decoder 143 and decoded into 
signals of the quantized value of the transform coefficient, 
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the motion vector, and the quantization scale. Note that a from the reconstructed value in the transform coefficient 

code (alpha-code) 56 of an alpha-map signal is supplied to domain and supplies these values to the selector 300. 

an alpha-map decoding apparatus (not shown) where the One modification of the motion compensation prediction 

code is converted into the alpha-map signal, and the signal sections 600, 601, 602, and 603 of the second embodiment 

is supplied to the motion compensation prediction sections s will be described below with reference to FIGS. 8A and 8B. 

602 and 603. This modification is accomplished by expanding a back- 

Of the signals decoded by the variable -length decoder ground prediction system (e.g., Miyamoto et al.: "Adaptive 

143, the quantized value of the transform coefficient is Predictive Coding System Using Background Prediction", 

dequantized by a dequantizer 173 and supplied to an adder PCSJ88,7-4, pp. 93-94, Watanabe et al: "Adaptive Four- 

193, The adder 193 adds the dequantized transform coeffi- 10 Differcnce-DCT Coding System", PCSJ88,8-2, pp. 

cient and a motion compensation prediction value in a 117-118), which is conventionally used to improve the 

transform coefficient domain supplied from the motion efficiency of encoding of a background region hidden by the 

compensation prediction section 603, thereby calculating a movement of an object, so that the system is also usable to 

reconstructed value in the transform coefficient domain. overlapping of objects. 

This reconstructed value is supplied to the motion com- 15 As shown in FIGS. 8A and 8B, the motion compensation 

pensation prediction section 603 and inversely transformed prediction section comprises an arbitrary-shape inverse 

into a reconstructed picture signal by an arbitrary-shape orthogonal transform circuit 610, an SW circuit 650, frame 

inverse orthogonal transform circuit 610. The signal is memories 621, 622, and 623, an SW circuit 651, a motion 

output as an output reconstructed picture signal 41 and compensation circuit 630, and an arbitrary shape orthogonal 

stored in a frame memory 620 (FIG. 6B). 20 transform circuit 640. 

An enhancement layer bit stream 22 formed by encoding The arbitrary-shape inverse orthogonal transform circuit 

and multiplexing signals of a quantized value of a transform 610 inversely transforms a reconstructed picture signal in 

coefficient and side information such as a quantization scale accordance with an alpha-map signal When a reconstructed 

is input to the input stage of the enhancement layer. The bit value in a transform coefficient domain is input to the motion 

stream 22 is stored in an input buffer 162 and supplied to a 25 compensation prediction section, this reconstructed value is 

demultiplexer 152. The demultiplexer 152 demultiplexes the supplied to the arbitrary -shape inverse orthogonal transform 

bit stream into a code of the quantized value of the transform circuit 610 as one component of the motion compensation 

coefficient and a code of the side information. prediction section. The circuit 610 inversely transforms the 

The codes demultiplexed by the demultiplexer 152 are reconstructed value into a reconstructed picture signal in 

supplied to a variable-length decoder 142 and decoded into 30 accordance with an alpha-map signal This alpha-map signal 

signals of the quantized value of the transform coefficient is supplied from an alpha-map decoding apparatus (not 

and the quantization scale. The quantized value of the shown) provided in the decoding system, 

transform coefficient is dequantized by a dequantizer 172 The reconstructed picture signal inversely transformed by 

and supplied to an adder 181. The adder 181 adds this the arbitrary-shape inverse orthogonal transform circuit 610 

dequantized value to a dequantized value 31 supplied from 35 is stored in one of the frame memories 621, 622, and 623. 

the dequantizer 173 of the base layer and supplies the sum The SW circuit 650 selects one of the frame memories 621, 

to an adder 192. 622, and 623 into which the signal is stored. For example, 

The adder 1 92 adds the output from the adder 181 and the of the frame memories 621, 622, and 623, the frame memo- 
signal EP supplied from the selector 300 and thereby cal- ries 621 and 622 are used to store object pictures, and the 
culates a reconstructed value in a transform coefficient 40 frame memory 623 is used to store background pictures. The 
domain. This reconstructed value is supplied to the motion object frame memories are prepared for two frames to 
compensation prediction section 602 and inversely trans- separately hold pictures of two different objects appearing in 
formed into a reconstructed picture signal by an arbitrary- a frame. If three or more different objects exist, it is only 
shape inverse orthogonal transform circuit 610 (FIG. 6B) necessary to prepare the number of object frame memories 
provided in the motion compensation prediction section 602. 45 corresponding to the number of objects and allow the switch 
The reconstructed picture signal is output as an output circuit 650 to select a corresponding frame memory, 
reconstructed picture signal 40 and stored in a frame The SW circuit 650 can store the reconstructed picture 
memory 620 provided in the motion compensation predic- signal from the arbitrary-shape inverse orthogonal transform 
tion section 602. circuit 610 in one of the frame memories 621, 622, and 623 

Of the reference pictures stored in the frame memory 620, 50 in accordance with an alpha-map signal by opening or 
a picture in a position indicated by the motion vector is closing the switch in accordance with the alpha-map signal, 
extracted by a motion compensation circuit 630 (FIG. 6B) in The SW circuit 651 opens or closes the switch in accor- 
the motion compensation prediction section in units of dance with the alpha-map signal, thereby selecting one of 
blocks, and the extracted picture is supplied to an arbitrary the frame memories 621, 622, and 623 in accordance with 
shape orthogonal transform circuit 640 in the motion com- 55 the alpha-map signal and reading out the reconstructed 
pensation prediction section. The arbitrary-shape orthogonal picture signal stored in that memory Of the reconstructed 
transform circuit 640 orthogonally transforms the motion picture signal (reference picture) read out from the frame 
compensation prediction value of an arbitrary shape in memory 621, 622, or 623 via the SW circuit 651, the motion 
accordance with an externally supplied alpha-map signal 50, compensation circuit 630 extracts a picture in a position 
thereby calculating and outputting a motion compensation 60 indicated by a motion vector in units of blocks and supplies 
prediction value in a transform coefficient domain. For a the extracted picture to the arbitrary-shape orthogonal trans- 
block containing the boundary between the object and the form circuit 640. 

background, transform coefficients for both the object and The arbitrary-shape orthogonal transform circuit 640 

the background are calculated. orthogonally transforms the reconstructed picture signal of 

In this way, the motion compensation prediction sections 65 the picture in the position indicated by the motion vector, 

600 and 601 calculate the motion compensation prediction which is read out from the frame memory 621, 622, or 623 

values EMC and BMC in the transform coefficient domain via the SW circuit 651, on the basis of the alpha-map signal, 
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thereby orthogonally transforming a motion compensation 
prediction value of a picture of an arbitrary shape indicated 
by the alpha-map signal. That is, the circuit 640 calculates 
and outputs the motion compensation prediction value in the 
transform coefficient domain. 

In the configuration shown in FIGS. 8A and 8B, it is 
assumed that the alpha-map supplied to the arbitrary-shape 
inverse orthogonal transform circuit 610 can specify one of 
a plurality of objects and a background to which a pixel 
belongs. 

In this configuration, a reconstructed value in a transform 
coefficient domain is supplied to the motion compensation 
prediction section. A portion of this reconstructed value, i.e., 
a picture of an arbitrary shape indicated by an alpha-map is 
inversely orthogonally transformed into a reconstructed pic- 
ture signal by the arbitrary-shape inverse orthogonal trans- 
form circuit 610. The reconstructed picture signal is stored 
in one of the frame memories 621, 622, and 623 for the 
objects and the background by the SW circuit 650 in 
accordance with the alpha-map signal. 

These stored signals are sequentially selected and read out 
by the SW circuit 651 in accordance with the alpha-map 
signal and supplied to the motion compensation circuit 630 
where a motion compensation prediction value is calculated. 

As described above, to calculate the motion compensation 
prediction value in the motion compensation circuit 630, the 
SW circuit 651 forms the motion compensation prediction 
value from the reference pictures stored in the frame memo- 
ries. This improves the efficiency of encoding of a region 
which is hidden by overlapping of objects. 

The motion compensation prediction value calculated by 
the motion compensation circuit 630 is supplied to the 
arbitrary-shape orthogonal transform circuit 640 and 
orthogonally transformed on the basis of the alpha-map 
signal. The result is an orthogonal transform coefficient of 
the motion compensation prediction value of the picture of 
the arbitrary shape indicated by the alpha-map signal. 

In this modification as described above, to obtain the 
motion compensation prediction value in the motion com- 
pensation circuit 630, the motion compensation prediction 
value is formed for each of pictures stored in the frame 
memories which separately store pictures of the objects and 
the background in accordance with an alpha-map signal. 
This improves the efficiency of encoding of a region hidden 
by overlapping of the objects. 

Other configurations of the quantizers 130 and 131 and 
the dequantizers 170, 171, 172, and 173 in the first and 
second embodiments will be described below with reference 
to FIGS. 2 and 9 to 11. 

Quantization matrices shown in FIGS. 9 and 10 are 
described in TM5 as a test model of MPEG2. In each of 
FIGS. 9 and 10, the matrix is represented by a two- 
dimensional matrix in a horizontal direction (h) and a 
vertical direction (v) with respect to 8x8 transform coeffi- 
cients. 

The following equations show examples of quantization 
and dequantization using the quantization matrices in FIGS. 
9 and 10. 



Quantization: 

level(v, h) - ■ign(coef(v, A))*|cocf(v, h)\*(v t /»)/(2*e_Pcale) 
Inverse quantization: 
coef(v, h) - sing(lcvcl(v, h)){l m \\vn\(v, A)|*«(v, h)/16 + l)*C_jca]t 



where 
coef(v,h): 



transform coefficient 
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level(v,h): quantized value 

coef(v,h): transform coefficient (reconstructed value) 
w(v,h): quantization matrix 
Q_scale: quantization scale 

The modification is related to a quantization matrix for 
changing the weight of a quantization step size for each 
transform coefficient. In the SNR scalability, quantization in 
the enhancement layer is performed more finely than in the 
base layer. 

In the base layer, therefore, the quantization matrices as 
shown in FIGS. 9 and 10 are used to finely quantize 
low-frequency transform coefficients and roughly quantize 
high-frequency coefficients. If this is the case, the subjective 
evaluation often improves when encoding is performed at 
the same encoding rate as when encoding is performed with 
a fixed quantization step size. Also, the coding efficiency is 
increased by increasing the occurrence probability of 0 by 
enlarging the center dead zone in a quantizer. This improves 
the quality of reconstructed pictures at low rates. 

In the enhancement layer, on the other hand, if high- 
frequency transform coefficients are roughly quantized, no 
fine textures are reconstructed to result in visual degrada- 
tion. This also increases the influence of feedback quanti- 
zation noise in high-frequency transform coefficients. 

Accordingly, in the enhancement layer a quantization 
matrix is used only for a transform coefficient whose quan- 
tized value BQ in the base layer is not "0". FIG. 11 shows 
an example of a quantization matrix obtained for the 
example shown in FIG. 2. When this matrix is used, the 
quantization error of a transform coefficient whose motion 
compensation prediction error is large is increased in the 
enhancement layer. However, a quantization error in a 
largely changing portion is inconspicuous due to the mask- 
ing effect of visual characteristics, and the resulting visual 
degradation is little. 

An example of transform to a one-dimensional sequence 
performed when a quantized transform coefficient is 
variable-length-encoded in the first and second embodi- 
ments will be described below with reference to FIGS. 2, 12, 
and 13. This transform to a one-dimensional sequence is 
generally done by using a transform method called zigzag 
scan shown in FIG. 12. 

FIG. 12 shows a two-dimensional matrix divided into 
eight portions in each of a horizontal direction (h) and a 
vertical direction (v). In FIG. 12, 8x8 transform coefficients 
are arranged in increasing order of numbers given in mea- 
sures. Consequently, low-frequency transform coefficients 
are arranged first and high-frequency transform coefficients 
are arranged next. Therefore, the larger the ordinal number 
for a quantized value, the higher the probability of "0", and 
this increases the coding efficiency when a combined event 
of the number of 0 runs and quantized values after the 0 runs 
is variable-length-encoded. This makes use of the properties 
that a lower-frequency coefficient has a higher electric 
power. 

In this example, therefore, the scan order in the enhance- 
ment layer is such that transform coefficients in positions 
where the quantized value BQ of the base layer is not "0" are 
arranged before transform coefficients in positions where 
BQ is "0" so that the transform coefficients are arranged in 
increasing order of zigzag scan numbers. 

That is, a transform coefficient in a position where BQ is 
"0" is a motion compensation prediction error signal in the 
enhancement layer, and a transform coefficient in a position 
where BQ is not "0" is a quantization error in the base layer. 
Accordingly, the above method is based on the assumption 
that the statistical properties of the two transform coeffi- 
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cients are different. FIG. 13 shows a scan order correspond- 
ing to the example shown in FIG. 2. In FIG. 13, the order of 
numbers given in measures is the scan order. 

In the above example, it is assumed that transform bases 
do not overlap between blocks. On the other hand, reference: 
"Ikuzawa et al, Video Encoding Using Motion Compensa- 
tion Filter Bank Structure, PCSJ92,8-5, 1992" has proposed 
an encoding method using a motion compensation filter 
bank structure in which a decrease of the coding efficiency 
is little even when bases overlap each other because a 
transformed difference arrangement is used. 

The concept of the above reference is applicable to a 
prediction encoding apparatus (transformed difference 
arrangement) in an orthogonal transform coefficient domain 
as in the present invention. Therefore, the motion compen- 
sation filter bank structure can be applied to the above 
example. 

In the second embodiment described above, a frame of a 
video picture is orthogonally transformed by dividing the 
frame into matrices each having a predetermined number 
(NxN) of pixels to obtain transform coefficients for indi- 
vidual bands of the spacial frequency, and motion compen- 
sation is performed in a transform coefficient domain for 
each of the NxN transform coefficients in each of upper and 
lower layers. In this video encoding, motion compensation 
is performed for a picture in an interest region by using 
alpha-map information. When this motion compensation is 
performed, whether motion compensation prediction is cor- 
rect is checked on the basis of an already decoded and 
quantized value in the lower layer (base layer). If the motion 
compensation prediction is correct, encoding in the upper 
layer (enhancement layer) is performed by using a motion 
compensation prediction value obtained for the upper layer 
and having a smaller encoding distortion. If the motion 
compensation prediction is incorrect, encoding in the upper 35 
layer is done by using a motion compensation prediction 
value obtained for the lower layer (base layer) and having a 
larger encoding distortion than that in the enhancement 
layer. This improves the coding efficiency for a coefficient 
the motion compensation prediction for which is incorrect 40 
and realizes an encoding system capable of encoding with 
little decrease in the coding efficiency. 

Accordingly, the resolution and the image quality can be 
varied in an arbitrary-shape picture encoding apparatus 
which separately encodes the background and the object. In 45 
addition, it is possible to provide scalable encoding and 
decoding apparatuses having a high coding efficiency. 

The third embodiment will be described below with 
reference to FIG. 14. 

As shown in FIG. 14, in blocks (enclosed by the solid 
lines) containing the boundary between the object and the 
background, motion vectors are separately detected for the 
object and the background. Since the number of pixels of 
either the object or the background decreases, the influence 
of noise increases, and this decreases the reliability of the 
motion vector. 

In blocks in the boundary, therefore, the motion vector 
detection range is made narrower than that in other blocks 
(indicated by the broken lines). 

Also, the object in a current frame has moved from the 
object in a reference frame. Therefore, erroneous detection 
of a motion vector can be reduced by limiting the motion 
vector search range for the object to the inside of the object 
in the reference frame. Limiting the search range also has an 
effect of decreasing the amount of motion vector search 
calculations. Likewise, a motion vector is calculated from a 
background portion in the background. 
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As described above, a large error can be prevented by 
making the motion vector detection range in blocks in the 
boundary narrower than that in other blocks (indicated by 
the broken lines). 

Finally, as an application of the present invention, an 
embodiment of a video transmission system to which the 
video encoding and decoding apparatuses of the present 
invention are applied will be described below with reference 
to FIGS. 15Ato 15C. 

In this system as shown in FIG. 15 A, an input video signal 
from a camera 1002 of a personal computer (PC) 1001 is 
encoded by a video encoding apparatus incorporated into the 
PC 1001. The encoded data output from the video encoding 
apparatus is multiplexed with other data of sounds or 
information. The multiplexed data is transmitted by radio 
from a radio transceiver 1003 and received by another radio 
transceiver 1004. 

The signal received by the radio transceiver 1004 is 
demultiplexed into the encoded data of the video signal and 
the data of sounds or information. The encoded data of the 
video signal is decoded by a video decoding apparatus 
incorporated into a workstation (EWS) 1005 and displayed 
on the display of the EWS 1005. 

An input video signal from a camera 1006 of the EWS 
1005 is encoded in the same fashion as above by using a 
video encoding apparatus incorporated into the EWS 1005. 
The encoded data of the video signal is multiplexed with 
other data of sounds or information. The multiplexed data is 
transmitted by radio from the radio transceiver 1004 and 
received by the radio transceiver 1003. The signal received 
by the radio transceiver 1003 is demultiplexed into the 
encoded data of the video signal and the data of sounds or 
information. The encoded data of the video signal is decoded 
by a video decoding apparatus incorporated into the PC 
1001 and displayed on the display of the PC 1001. 

FIG. 15B is a block diagram schematically showing the 
video encoding apparatus incorporated into the PC 1001 and 
the EWS 1005 in FIG. 15A. FIG. 15C is a block diagram 
schematically showing the video decoding apparatus incor- 
porated into the PC 1001 and the EWS 1005 in FIG. 15A 

The video encoding apparatus in FIG. 15B comprises an 
information source encoding section 1102 which receives a 
picture signal from a video input section 1101 such as a 
camera and has an error resilience processor 1103, and a 
transmission line encoding section 1104. The information 
source encoding section 1101 performs discrete cosine trans- 
form (DCT) for a prediction error signal and quantizes the 
formed DCT coefficient. The transmission line encoding 
section 11 04 performs variable -length encoding, error detec- 
tion for encoded data, and error correcting coding. The 
encoded data output from the transmission line encoding 
section 1104 is supplied to a radio transceiver 1105 and 
transmitted. The processing in the data source encoding 
section 1101 and the van able -length encoding in the trans- 
mission line encoding section 1104 is done by applying 
processing methods such as explained in the embodiments 
of the present invention. 

The video decoding apparatus shown in FIG. 15C com- 
prises a transmission line decoding section 1202 and a data 
source decoding section 1203 having an error resilience 
processor 1204. The transmission line decoding section 
1202 receives encoded data received by a radio transceiver 
1201 and performs processing which is the reverse of the 
processing done by the transmission line encoding section 
1104 for the input encoded data. The data source decoding 
section 1203 receives the output signal from the transmis- 
sion line decoding section 1202 and performs processing 
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which is the reverse of the processing doae by the data 
source encoding section 1102 for the input signal. The 
picture decoded by the data source decoding section 1203 is 
output by a video output section 1025 such as a display. 

The decoding processing in these sections is performed by 5 
applying processing methods such as explained in the 
embodiments of the present invention. As has been 
described above, the present invention accomplishes scal- 
able encoding in which the quality of an arbitrary-shape 
picture can be varied step by, step without largely decreasing 30 
the coding efficiency. Also, in the present invention, it is 
possible to decrease the amount of generated codes when 
DCT is performed for a picture of an arbitrary shape. 

Additional advantages and modifications will readily 
occur to those skilled in the art. Therefore, the invention in 15 
its broader aspects is not limited to the specific details, 
representative devices, and illustrated examples shown and 
described herein. Accordingly, various modifications may be 
made without departing from the spirit or scope of the 
general inventive concept as defined by the appended claims 20 
and their equivalents. 

What is claimed is: 

1. A video encoding apparatus comprising: 

means for dividing an input video signal into a plurality 2 s 

of blocks each containing NxN pixels; 
means for encoding and outputting an alpha-map signal 

for discriminating a background of a picture from an 

object thereof; 

means for calculating an average value of pixels of the 30 
object in units of blocks for the picture, using the 
alpha-map signal; 

means for assigning the average value obtained by said 
means for calculating an average value to a pixel of the 
background of a block; 

means for deciding, using the alpha-map signal, whether 
a pixel in the background is close to the object; 

means for correcting the pixel in the background which is 
decided to be close to the object as a decision result of 40 
said deciding means in such a manner that a pixel value 
of the pixel comes close to a pixel value of the object 
near the pixel; and 

means for orthogonally transforming the block including 
the pixel corrected by said correcting means to obtain 45 
a plurality of orthogonal transform coefficients. 

2. The video encoding apparatus according to claim 1, 
wherein said correcting means includes means for generat- 
ing new pixel values of the pixels to be altered in the 
background, using pixel values of the pixels in the object 50 
which are adjacent to the background. 

3. The video encoding apparatus according to claim 1, 
wherein said correcting means includes means for generat- 
ing new pixel values of pixels to be altered, using pixel 
values of pixels adjacent to the pixels to be altered. 55 

4. The video encoding apparatus according to claim 1, 
wherein said correcting means includes means for generat- 
ing new pixel values of the pixels to be altered in the 
background, using pixel values of the pixels in the object 
which are near to the background. 60 

5. The video encoding apparatus according to claim 1, 
wherein said correcting means includes means for generat- 
ing new pixel values of pixels to be altered, using pixel 
values of pixels near to the pixels to be altered. 

6. The video encoding apparatus according to claim. 1, 65 
wherein said correcting means includes means for altering 
values of the pixels in the object. 
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7. The video encoding apparatus according to claim 2, 
wherein said generating means generates the new pixel 
values of the pixels to be altered in the background, using 
pixels of the background as well as pixel values of the pixels 
in the object which are adjacent to the background. 

8. The video encoding apparatus according to claim 7, 
wherein said generating means generates the new pixel 
values of the pixels to be altered, using current pixel values 
of the pixels to be altered as well as the pixel values of the 
pixels adjacent to the pixels to be altered. 

9. The video encoding apparatus according to claim 4, 
wherein said generating means generates the new pixel 
values of the pixels to be altered in the background, using 
pixel values of pixels of the background as well as the pixel 
values of the pixels in the object which are near to the 
background. 

10. The video encoding apparatus according to claim 5, 
wherein said generating means generates the new pixel 
values of the pixels to be altered, using current pixel values 
of the pixels to be altered as well as the pixel values of the 
pixels near to the pixels to be altered. 

11. A method of encoding a video picture comprising the 
steps of: 

dividing an input video signal into a plurality of blocks 
each containing NxN pixels; 

encoding and outputting an alpha-map signal for discrimi- 
nating a background of a picture from an object thereof; 

calculating an average value of pixels of the object in 
units of blocks for the picture, using the alpha-map 
signal; 

assigning the average value obtained by said step of 
calculating an average value to a pixel of the back- 
ground of a block; 

deciding, using the alpha-map signal, whether a pixel in 
the background is close to the object; 

correcting the pixel in the background which is decided to 
be close to the object as a decision result of said 
deciding step in such a manner that a pixel value of the 
pixel comes close to a pixel value of the object near the 
pixel; and 

orthogonally transforming the block including the pixel 
corrected by said correcting step to obtain a plurality of 
orthogonal transform coefficients. 

12. The method according to claim 11, wherein said 
correcting step includes generating new pixel values of the 
pixel to be altered in the background, using pixel values of 
the pixels in the object which are adjacent to the back- 
ground. 

13. The method according to claim 11, wherein said 
correcting step includes generating new pixel values of 
pixels to be altered, using pixel values of pixels adjacent to 
the pixels to be altered. 

14. The method according to claim 11, wherein said 
correcting step includes generating new pixel values of the 
pixels to be altered in the background, using pixel values of 
the pixels in the object which are near to the background. 

15. The method according to claim 11, wherein said 
correcting step includes generating new pixel values of 
pixels to be altered, using pixel values of pixels near to the 
pixels to be altered. 

16. The method according to claim 11, wherein said 
correcting step includes altering values of the pixels in the 
object. 

17. The method according to claim 12, wherein said 
generating step generates the new pixel values of the pixels 
to be altered in the background, using pixel values of pixels 
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in the background as well as the pixel values of the pixels in 
the object which arc adjacent to the background. 

18. The method according to claim 13, wherein said 
generating step generates the new pixel values of the pixels 
to be altered, using current pixel values of the pixels to be 
altered as well as the pixel values of the pixels adjacent to 
the pixels to be altered. 

19. The method according to claim 14, wherein said 
generating step generates the new pixel values of the pixels 
to the altered in the background, using pixel values of pixels 
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of the background as well as the pixel values of the pixels 
in the object which are near to the background. 

20. The method according to claim 15, wherein said 
generating step generates the new pixel values of the pixels 
5 to be altered, using current pixel values of the pixels to be 
altered as well as the pixel values of the pixels near to the 
pixels to be altered. 



* * * * * 
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